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Chapter 1 


Introduction 


This report is concerned with the development and the formal specification and ver- 
ification of a fault-masking and transient-recovery model appropriate to the repli- 
cated computers in digital flight-control systems (DFCS). The culmination of the 
verification is a mechanically checked theorem which establishes, subject to certain 
carefully stated assumptions, that faults among the component computers of the 
DFCS will be masked — so that the commands sent to the actuators will be the 
same as those that would be sent by a single computer that suffers no failures. 

In order to make this report accessible to those unfamiliar with fault-tolerant 
process-control systems, we begin this chapter with a brief exposition of DFCS, 
and then present the rationale for the particular model that is the focus of our 
formal investigation. (See [1] for a full treatment of digital avionics systems, [2] for 
a treatment of general validation issues, and [3] for a description of current practice 
in the verification and validation of software for DFCS.) 

The second chapter presents the model formally, in the manner of a conventional 
mathematical development. The proof of the fault-masking and transient-recovery 
theorem is presented in the same way. 

The third chapter outlines the fully formal specification of the model, and its 
mechanically checked verification. These were undertaken using the Ehdm formal 
specification and verification system [4-8]; the Eh DM specification text and related 
material are given in the Appendices. 

The fourth chapter discusses the relationship between the model employed here 
and the similar one developed by Di Vito, Butler, and Caldwell of NASA Langley 
Research Center [9, 10]. The fifth and final chapter presents our conclusions and 
recommendations for further work. 


1 



2 


Chapter 1. Introduction 


1.1 Digital Flight-Control Systems 

Increasingly, modern aircraft rely on Digital Flight- Control Systems— computer sys- 
tems that interpret the pilot’s control inputs and send appropriate commands to the 
control surfaces and engines. 1 Depending on the aircraft design, DFCS may manage 
all, or merely some, of the control surfaces and may or may not have back-up sys- 
tems comprising either analog computers or conventional mechanical and hydraulic 
systems. The advantages claimed for DFCS include the following: 

Safety: DFCS can prevent the pilot stalling the plane, or otherwise taking it be- 
yond its control envelope. For example, the F16 provides yaw-rate limitation 
to prevent the aircraft entering a certain flat spin mode that has “unaccept- 
able recovery,” and rudder fade-out to ensure that “pilots could not get in 
trouble because of flying habits developed in other aircraft” [11]. Similarly, 
the Airbus A320 DFCS provides “stall/windshear protection and protection 
also against overspeed and overstress . . . the A320’s system automatically pre- 
vents the aircraft leaving its safe-flight envelope at any point, whether pilot 
error or incompetence, engine malfunction, or the elements have brought it 
to that point” [12] (but see also [13]). Other contributions to safety may in- 
clude reduction in pilot workload through increased automation and improved 
handling. 

Economy and performance: Elimination of heavy hydraulic and mechanical 
control linkages reduces aircraft weight and thereby improves fuel-efficiency 
and load-carrying capacity [14]. Optimum control of engine thrust and angle 
of attack can also reduce fuel consumption significantly. 

Efficiency and performance can sometimes be gained at the expense of han- 
dling qualities. DFCS can restore neutral handling characteristics to such 
aircraft. Maneuverability in unusual flight regimes (e.g., post-stall) may re- 
quire complex transformations between command inputs and actuator outputs 
that can only be achieved by computer control. For example, roll commands 
in the X31 at high angles of attack are interpreted relative to the velocity 
vector, not the longitudinal axis of the aircraft. Thus at 90° angle of attack, 
a pure roll command translates to a pure yaw in the body axis [15]. In the 
limit, high maneuverability, stealth, or other requirements for military aircraft 
may best be achieved with an unstable airplane — which will require computer 
control in order to fly at all. 

iThe popular term fly-by-wire (FBW) covers both DFCS and similar, earlier, systems that 
employ analog computers. Fly-by-light is simply FBW in which fiber-optic cables replace the 
copper wires used to route signals around the aircraft. 
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Damage control: The loss of a control surface or engine sometimes results in a 
crash, not because the airplane is absolutely uncontrollable, but because its 
pilot is unable to learn how to control it in the time available. For example, 
it is very hard to control a twin-engined light plane if one of the engines fails, 
and private pilots often crash in this circumstance. A DFCS can partially 
compensate for the massive change in flying characteristics caused by failure 
or damage and thereby assist the pilot to make a safe landing. Simulations 
have been performed to investigate the efficacy of such systems for military 
aircraft suffering battle damage [16]. 

The perceived advantages of DFCS are such that they are employed in almost 
all modern high-performance western military aircraft. Modern western passenger 
aircraft generally have full- authority digital engine controls (FADEC); digital au- 
topilot, autolander, and flight management system; and digital control of secondary 
surfaces and functions, such as air brakes, spoilers, yaw damping, and gust allevia- 
tion. However, the Airbus A320 is the only passenger aircraft in service with a full 
DFCS — that is, one controlling primary control surfaces in the pitch and roll axes. 2 
Forthcoming passenger aircraft such as the Boeing 777 will also employ comprehen- 
sive DFCS. 

The greater the benefit provided by DFCS, the less plausible it becomes to 
provide adequate back-up systems employing different technologies. For example, 
the DFCS of an experimental version of the F16 fighter (the “Advanced Fighter 
Technology Integration” or AFTI-F16) provides control in flight regimes beyond 
the capability of the simpler analog back-up system. Extending the capability of 
the back-up system to the full flight envelope of the DFCS would add considerably 
to its complexity — and it is the very simplicity of that analog system that is its 
chief source of credibility as a back-up system [17]. Similarly, direct manual control 
of flight surfaces is unlikely to be available if elimination of heavy mechanical and 
hydraulic systems was a primary reason for installing DFCS in the first place. Thus, 
the Airbus A320 has mechanical finks to only the rudder and the elevator trim- 
tab [14, 18] and is given no certification credit for these back-up systems by the 
FA A. 


1.2 Fault Tolerance for DFCS 

It is clear that extreme reliability must be required of DFCS. A much-quoted figure 
is a requirement for passenger aircraft that the probability of catastrophic failure 
during a 10 hour flight should be less than 10“ 9 per hour [19]. Such reliabilities 
are beyond those that can be guaranteed for individual digital devices. Not only 

2 The Concorde, which received FA A certification in 1969, has analog FBW with mechanical 
backup in all three primary axes. 
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must occasional latent manufacturing defects and the effects of aging be considered, 
but also environmental hazards such as power-supply surges, lightning strikes, and 
cosmic rays (which can cause single-event-upsets, or SEUs). These factors conspire 
to yield an overall reliability well below that required. It follows that some form of 
fault tolerance based on replication and redundancy is needed in order to achieve 
an underlying “hardware platform” of the required reliability. There are many 
configurations for redundant and replicated computer systems, and careful reliability 
analysis is required to evaluate the reliability provided by a given configuration and 
level of redundancy [20]. Such analyses show that suitably constructed N-modularly 
redundant systems (which we will call N-plexes for brevity) can achieve the desired 
reliability. 

Within an N-plex, all calculations are performed by N identical computer sys- 
tems and the results are submitted to some form of averaging or voting. Great care 
must be taken to eliminate single-point failures, so the separate computer systems 
(or “channels,” as they are often called in fault-tolerant systems) will generally use 
different power supplies and be otherwise electrically and physically isolated as far 
as possible. Notice, however, that there is no protection against design faults: any 
such faults in either the hardware or the software will be common to all members 
of the N-plex and all will fail together. In this report, we do not address the is- 
sue of design faults in the hardware, nor in the application software that it runs. 
We are, however, very much concerned with the possibility of design faults in the 
redundancy-management software that harnesses the failure-prone individual com- 
ponents together as a fault-tolerant N-plex. There is evidence (see page 8) that 
redundancy management is sufficiently complex and difficult that it can become the 
primary source of unreliability in a DFCS. 

The function performed by a DFCS is basically one of process control, as por- 
trayed in Figure 1.1. The goal is to control the airplane in flight under command 
of the pilot. Information about the state of the airplane, which is subject to exter- 
nal disturbances, is obtained through sensors, and control is exercised by sending 
commands to actuators. The basic structure of most process- control software is 
very similar: the software performs a repetitive cycle of sampling sensors and con- 
trol inputs, using control laws to calculate the required actuator response and then 
sending appropriate commands to the actuators. The complete cycle is generally 
broken into individual “frames,” each attending to a particular dimension of con- 
trol: for example, one frame may deal with pitch-control — sampling the appropriate 
sensors, computing the necessary corrections, and sending commands to the eleva- 
tors; another frame may deal with roll, still another with navigation, and so on. 
Some variables may need more rapid control than others, so that a complete cycle 
might contain four pitch-control frames, two roll frames, and only a single naviga- 
tion frame. This general pattern of activity is described as a multi-rate periodic 
schedule. 
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Figure 1.1: The DFCS Process-Control Loop 


Each frame will perform several computational activities: sampling sensors, eval- 
uating control laws, generating control outputs, performing self- tests, and so on. 
“Tasks” are the primitive computational elements within this structure: they are 
the individual units of activity that may be scheduled and executed. The schedul- 
ing slots within a frame and to which individual tasks may be allocated are called 
“subframes.” Thus, for example, the subframes within a pitch-control frame may 
be allocated to several sensor-sampling tasks, an averaging task to integrate the 
readings of redundant sensors, a control law task, and an actuator-output task. 

Many refinements are possible within this basic paradigm. For example, there 
may be a fixed, static, schedule of frames, so that all cycles are identical; alterna- 
tively, frames may be scheduled dynamically, depending on external circumstances. 
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Similarly, frames may all execute for a common fixed duration, or may have different 
durations; they may always execute to completion, or may be subject to preemption, 
and so on. Whether task scheduling for critical real-time systems should be static or 
dynamic is a controversial issue. Proponents of static schedules point to Richards’ 
anomalies [21,22], in which the early completion of one task can cause another to be 
late, and other difficulties in dynamic scheduling as indications that the predictabil- 
ity required for hard real-time systems is best achieved by static scheduling. 

The major challenge in the design of a fault-tolerant N-plex for DFCS is one of 
redundancy management. Instead of a single computer executing the DFCS soft- 
ware, there wifi be several, which must coordinate and vote (or average) actuator 
commands, 3 and tolerate faults among their own members. In addition to the repli- 
cated computers, sensors and actuators will be replicated also. The management of 
all this redundancy and replication adds considerable complexity to both the oper- 
ating system (generally called an “executive” in process- control systems) and the 
application tasks. Complexity is a source of design faults, and there is a distinct 
possibility that such a large quantity of additional code may lessen, rather than 
enhance, overall reliability. The goal of the research program, of which this work is 
a component, is to develop principled, structured, and formally specified and ver- 
ified approaches to the design and implementation of redundancy management in 
DFCS [9]. 

A plausibly simple approach to redundancy management in an N-plex is the 
“asynchronous” design, in which the channels run fairly independently of each other: 
each computer samples sensors independently, evaluates the control laws indepen- 
dently, and sends its actuator commands to an averaging or selection component 
that chooses the value to send to the actuator concerned. The triplex-redundant 
DFCS of the experimental AFTI-F16 was built this way, and its flight tests reveal 
some of the shortcomings of the approach [17,23]. 

First, because the unsynchronized individual computers may sample sensors at 
slightly different times, they can obtain readings that differ quite appreciably from 
one another. The gain in the control laws can amplify these input differences to 
provide even larger differences in the results submitted to the output selection al- 
gorithm. During ground qualification of the AFTI-F16, it was found that these 
differences sometimes resulted in a channel being declared failed when no real fail- 
ure had occurred [24, p. 478]. 4 Accordingly, rather a wide spread of values must 
be accepted by the threshold algorithms that determine whether sensor inputs and 
actuator outputs are to be considered “good.” For example, the output thresholds 

3 Voting or averaging is often performed directly by the actuators, through some form of “force- 
summing.” For example, different channels may energize separate coils of a single solenoid, or 
multiple hydraulic pistons may be linked to a single shaft [11, Figure 3.2-2]. 

Also, in the flight tests of the X31 the control system “went into a reversionary mode four times 
in the first nine flights, usually due to disagreement between the two air data sources” [15], 
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of the AFTI-F16 were set at 15% plus the rate of change of the variable concerned; 
also the gains in the control laws were reduced. This increases the latency for de- 
tection of faulty sensors and channels, and also allows a failing sensor to drag the 
value of any averaging functions quite a long way before it is excluded by the input 
selection threshold; at that point, the average will change with a thump [23, Figure 
20] that could have adverse effects on the handling of the aircraft. 

The danger of wide sensor selection thresholds is dramatically illustrated by a 
problem discovered in the X29A. This aircraft has three sources of air data: a nose 
probe and two side probes. The selection algorithm used the data from the nose 
probe provided it was within some threshold of the data from both side probes. 
The threshold was large to accommodate position errors in certain flight modes. 
It was subsequently discovered that if the nose probe failed to zero at low speed, 
it would still be within the threshold of correct readings, causing the aircraft to 
become unstable and “depart.” This error was found in simulation, but 162 flights 
had been at risk before it was detected [25]. 

An even more serious shortcoming of asynchronous systems arises when the 
control laws contain decision points. Here, sensor noise and sampling skew may 
cause independent channels to take different paths at the decision points and to 
produce widely divergent outputs. This occurred on Flight 44 of the AFTI-F16 
flight tests [23, p. 44]. Each channel declared the others failed; the analog back- 
up was not selected because the simultaneous failure of two channels had not been 
anticipated and the aircraft was flown home on a single digital channel. Notice that 
all protective redundancy had been lost, and the aircraft was flown home in a mode 
for which it had not been designed — yet no hardware failure had occurred. 

Another illustration is provided by a 3-second “departure” on Flight 36 of the 
AFTI-F16 flight tests, during which sideslip exceeded 20°, normal acceleration ex- 
ceeded first — 4g, then +7g, angle of attack went to -10°, then +20°, the aircraft 
rolled 360 , the vertical tail exceeded design load, all control surfaces were oper- 
ating at rate limits, and failure indications were received from the hydraulics and 
canard actuators. The problem was traced to an error in the control laws, but sub- 
sequent analysis showed that the side air data probe was blanked by the canard at 
the high angle of attack and sideslip achieved during the excursion; the wide input 
threshold passed the incorrect value through and different channels took different 
paths through the control laws. Analysis showed this would have caused complete 
failure of the DFCS and reversion to analog backup for several areas of the flight 
envelope [23, pp. 41-42], 

Several other difficulties and failure indications on the AFTI-F16 were traced to 
the same source: asynchronous operation allowing different channels to take different 
paths at certain selection points. The repair was to introduce voting at some of these 
“software switches.” In one particular case, repeated channel failure indications in 
flight were traced to a roll-axis “software switch.” It was decided to vote the switch 
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(which, of course, required ad hoc synchronization) and extensive simulation and 
testing were performed on the changes necessary to achieve this. On the next flight, 
the problem was found still to be there. Analysis showed that although the switch 
value was voted, it was the unvoted value that was used [23, p. 38]. 

The AFTI-F16 flight tests revealed numerous other problems of a similar nature. 
Summarizing, Mackall [23, pp. 40-41] writes: 

“The criticality and number of anomalies discovered in flight and 
ground tests owing to design oversights are more significant than those 
anomalies caused by actual hardware failures or software errors. 

. . qualification of such a complex system as this, to some given level 
of reliability, is difficult . . . [because] the number of test conditions be- 
comes so large that conventional testing methods would require a decade 
for completion. The fault-tolerant design can also affect overall sys- 
tem reliability by being made too complex and by adding characteristics 
which are random in nature, creating an untestable design. 

“As the operational requirements of avionics systems increase, com- 
plexity increases. Reducing complexity appears to be more of an art 
than a science and requires an experience base not yet available. If the 
complexity is required, a method to make system designs more under- 
standable, more visible, is needed. 

“The asynchronous design of the [AFTI-F16] DFCS introduced a ran- 
dom, unpredictable characteristic into the system. The system became 
untestable in that testing for each of the possible time relationships be- 
tween the computers was impossible. This random time relationship 
was a major contributor to the flight test anomalies. Adversely affecting 
testability and having only postulated benefits, 5 asynchronous operation 
of the DFCS demonstrated the need to avoid random, unpredictable, and 
uncompensated design characteristics.” 

It is difficulties such as these that have caused those performing research in 
fault-tolerant systems for DFCS to prefer synchronized channels and exact-match 
voting [26-28]. Of course, the synchronization must itself be fault- tolerant and 
no such algorithms were known until about 1982. 6 A number of provably correct 
Byzantine fault-tolerant clock synchronization algorithms are now available [32-37], 

s The decision to use an asynchronous design for the AFTI-F16 DFCS was because the contrac- 
tor [Bendix under subcontract to General Dynamics] believed synchronization would introduce a 
single-point failure caused by electromagnetic interference (EMI) and lightning effects” [23, p. 7]— 
which may well have been correct given the technology of the early 1980s. 

6 Prior to the investigations of the SIFT project [29], the subtlety and delicacy of voting and 
synchronization protocols were not properly understood and most were seriously flawed: all were 
vulnerable to Byzantine faults (which constitute a fault class that had not been recognized before), 
and many were incapable of tolerating less severe faults. For example, the failure of the first attempt 
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and some have been formally verified [38]. An algorithm due to Infis and Moore [39] 
is attractively simple, and tolerates a very wide class of faults that is, however, short 
of the fully Byzantine. Probabilistic algorithms due to Cristian [40] can achieve very 
close synchronization, but also fall short of Byzantine fault tolerance. 

For exact-match voting, each channel must operate on the same data. Thus the 
computers cannot simply use their own private sensor readings, but must exchange 
sampled values with each other in a Byzantine fault-tolerant manner. By this means, 
every (working) computer begins each frame with the same set of sensor readings 
as the others. 7 Each computer will then run the same sensor selection and averag- 
ing algorithms, 8 and the same control laws, and should therefore generate identical 
actuator commands. Exact-match majority voting of the actuator commands then 
suffices to mask faults among the redundant channels. Notice that this arrange- 
ment allows sensor failures to be distinguished from failures among the redundant 
computers: sensor failure is detected or masked by the diagnostic, averaging, and 
selection algorithms run by each computer, whereas failure of a computer is masked 
(and optionally detected) by the exact-match majority voting of their outputs. In 
contrast, systems based on unsynchronized, independent channels cannot distin- 
guish accurately between the failure of a sensor and that of a computer, and may 
mistake the consequences of clock drift for either. 

Majority voting of actuator commands is sufficient to tolerate up to faults. 
However, the underlying Byzantine fault-tolerant clock synchronization and inter- 
active consistency algorithms can tolerate only faults: thus a 4-plex is required 
for single-fault tolerance, and a 7-plex for two-fault tolerance. Notice, however, that 
the 7-plex can withstand two simultaneous faults; if the fault arrival rate is such 
that a faulty channel can be identified and configured out of the system before the 
next fault arrives, then a 7-plex can withstand 4 faults, and two-fault tolerance can 
be achieved by a 5-plex. Fault detection and reconfiguration are complex functions, 
however, and given our desire to reason formally about fault-tolerance properties, 
we follow [9] and consider only the nonreconfigurable case in this work. (Reconfig- 
uration was considered in the verification of SIFT [53].) 

Not all faults are equal: some are “hard 7 ’ faults that permanently disable the 
afflicted channel; others are “soft 77 or “transient 77 faults from which recovery is pos- 

to launch the Space Shuttle was due to a synchronization problem [30], and the heavy radiation 
environment at Jupiter caused loss of synchronization on the Voyager spacecraft [31]. 

A given sensor may be sampled independently by several computers; all of these independent 
samples must be distributed to all other computers in a Byzantine fault-tolerant manner. As 
with clock synchronization, several Byzantine agreement (or interactive consistency) algorithms are 
known [41], and some have been formally verified down to the hardware implementation level [42,43]. 

In addition to detecting faults, the processing of sensor data must deal with noise, bias, drift, 
hysteresis, and other sensor-specific issues. The problems of sensor averaging, selection, and (espe- 
cially) fault diagnosis have been considered, more or less independently, by several disciplines— for 
example, control theory [44-49], artificial intelligence [50,51], and computer science [52]. 
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sible. Examples of transient faults include SEUs (where a single bit of memory is 
flipped by a cosmic ray), which can be recovered by simply restoring the affected bit 
to its correct value. Experience indicates that transient faults are orders of magni- 
tude more common than hard faults— for example, Voyager spacecraft suffered 42 
SEUs in the intense radiation surrounding Jupiter, but no hard faults [54]. It follows 
that overall reliability will be much greater— or, equivalently, much less redundancy 
will be required for a given level of reliability if some attempt is made to recover 
channels that suffer transient faults. 

There is no firm line between transient and hard faults considered in the abstract; 
what might be merely a transient fault to one system may be a hard fault to another 
that lacks the necessary recovery mechanisms. Fault-tolerant system architectures 
are designed and evaluated against explicitly stated fault models. For transient 
faults, we employ a fault model in which we distinguish two subclasses of faults. 

State data faults are those in which the processor is working correctly (i.e., is 
synchronized and executing the right task), but its local state data are cor- 
rupted. If its state data were replaced with correct values, it would recover. 
In our formal model, the predicate OK(i)(c ) will indicate whether processor i 
has state data faults that can affect its computation of task c. 

Control faults are those in which the processor is not working correctly (i.e., some- 
thing other than, or additional to, a state data fault has occurred). In our 
formal model, the predicate ^(i)(j) will indicate whether processor i suffers a 
control fault during the computation of the y’th task. 

In our model, we think of control faults as happening spontaneously, and state data 
faults as the consequences of control faults. Faults such as SEUs, in which a single 
bit of state data is spontaneously corrupted, can be considered as instantaneous 
control faults: we imagine that the processor computes the wrong value but then 
immediately recovers, leaving a state data fault behind. Note that a state data 
fault may precipitate a further control fault. For example, a word of memory may 
become set to zero (a state data fault); then a subsequent divide operation using 
that word might generate a divide- by- zero trap, which could halt the processor (a 
control fault). 

State data faults can be recovered by periodically replacing the state data main- 
tained by each processor with a majority-voted version. It is not necessary to vote 
and replace all the state data, since many of them are refreshed by sampling sensors 
(i.e., some of the state data are “stored” in the airframe itself [18]): only the data 
that are carried forward from one frame or cycle to the next (e.g., time-integrated 
data such as velocity and position) need to be voted. Even so, the quantity of state 
data maintained by a modern DFCS is considerable, and performance would be se- 
riously degraded if all of it were voted at every opportunity. Accordingly, exposure 
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is traded for performance and rather sparse voting patterns are preferred. Clearly 
the less frequently a particular item of state data is voted, the longer will be the 
duration of the consequences of a fault that corrupts that item. Overall reliability 
will be determined by the fault arrival rate, the voting pattern, and the dataflow 
dependencies among control tasks and state data items. 

In this report, our goal is to develop, and formally specify, a model that describes 
the operation of an N-plex with transient-recovery based on an arbitrary sparse 
voting pattern. We will formally verify a theorem concerning the conditions under 
which such a system masks faults successfully. A concrete instance of the theorem 
(for a specific data dependency graph and voting pattern) might be that the system 
is “safe” provided that at most two processors suffer control faults in any sequence 
of five successive frames. Markov or other methods of reliability analysis must be 
used to determine the overall reliability of the system, given assumptions about the 
arrival and repair rates of control faults [9). 

A fault-tolerant system should take active measures to recover from transient 
control faults, in addition to the voting strategy for overcoming state data faults. 
The Mars system [55,27] is a good example of a system that provides such’ recovery. 
In our model, however, we do not consider the internal details of mechanisms that 
achieve recovery from control faults, we model only their external behavior; the 
purpose of our model is to derive properties of the majority voting strategy for 
masking faults of all kinds and recovering from state data transient faults. 


1.3 Formal Models for DFCS 

In this section we sketch the larger context of the work described here, and then give 
an overview of the model for fault-masking and transient-recovery that we employ. 

This work was performed in the context of a research program led by NASA Lan- 
gley Research Center that aims to develop a fault-tolerant architecture for DFCS 
using formal methods to provide a rigorous basis for documenting and analyzing de- 
sign decisions. Ultimately, we hope to provide mechanically-checked formal specifi- 
cations and verifications for the key components of a “Reliable Computing Platform” 
for DFCS, going all the way from high-level requirements down to implementation 
details. Clearly, this is a major undertaking, so initially we are concentrating on 
some of the better-understood requirements and levels in the hierarchy. 

As we described in the previous section, synchronized channels and Byzantine 
fault-tolerant distribution of sensor values are now fairly well-understood require- 
ments. Accordingly, the first mechanically-checked specifications and verifications 
undertaken in this program were those performed for Byzantine fault-tolerant clock 
synchronization algorithms [38,56] and for a Byzantine agreement algorithm [42] 
and circuit [43], The work described here is a step towards the next higher layer 
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in the modeling hierarchy* the layer that uses exact-match voting to provide fault- 
tolerance and transient-recovery. 

Accurate modeling of that layer must account for the fact that the separate 
channels are not perfectly synchronized (the clock-synchronization algorithms keep 
the separate channels synchronized only within some small skew 6 of each other), 
and that the communication and coordination of voting data takes a certain amount 
of time. The work presented here ignores those details in order to concentrate on 
the relationship between voting patterns, fault masking, and transient recovery. 
Thus, we make the simplifying assumptions that the separate channels are perfectly 
synchronized, and that the communication and voting of data constitute a single 
atomic action. 9 

Our current work, following on from that described here, aims to eliminate these 
simplifying assumptions. In other current work, we are developing and formally 
verifying a hardware-assisted implementation of one of the clock-synchronization 
algorithms. Future work may consider the mechanisms by which failed channels 
can be recovered, or the system reconfigured. The next section gives an informal 
overview of the model that is the focus of the present analysis. 

1.3.1 Overview of the Fault-Masking Model Employed 

In companion work at NASA Langley Research Center, Di Vito, Butler and Cald- 
well [9] have developed a formal model for DFCS and derived its fault-masking and 
transient-recovery properties. Their model and development is formal and rigorous 
in the manner of conventional mathematical discourse. The purpose of our investiga- 
tion is to construct a completely formal, machine-checked specification for a similar 
model, and to submit the derived properties to mechanical proof-checking. The two 
investigations are complementary: the first is intended to model the structure of a 
realistic platform for DFCS, while the second is intended to explore the problems 
of subjecting formal specifications and verifications in this domain to mechanically 
checked analysis. 

Our model for fault masking and transient-recovery was developed in parallel 
with that of [9] and differs from it in several details, though not in overall principle. 
In this section, we briefly sketch the model of Di Vito, Butler, and Caldwell, and 
explain how and why ours differs. The relationship is described in more detail in 
Chapter 4. 

Di Vito, Butler, and Caldwell model a reliable computing platform for DFCS 
with the following characteristics: 

• The system workload is a multi-rate periodic schedule. 

9 Verification of the Oral Messages Byzantine agreement algorithm [42,43] makes the same sim- 
plifying assumptions. 
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• The schedule is static (i.e., the sequence of frames is identical from one cycle 
to another, and the subsequence of tasks within a given frame is the same in 
every activation of that frame). 

• All frames have equal duration; however, different frames may have different 
numbers of tasks, and different tasks may have different duration. Unused 
time at the end of a frame is called “slack” time; it can be used to run self- 
tests. Some slack time at both the beginning and the end of each frame is 
essential when discrete-update clock synchronization is used, since otherwise 
tasks could be skipped (if the clock jumps forward) or repeated (if it jumps 
back) [34]. 

• The output of a task may be used as input to a later task up to one cycle 
later. Data that need to be carried further forward must be relayed through 
intermediate tasks. 

• Sensors are sampled and actuators commanded at most once per frame. An 
underlying Byzantine fault- tolerant distribution of sensor samples is assumed, 
so that each (working) channel receives identical sensor input. 

• The fault model distinguishes processors that are working correctly throughout 
a frame from those that are not. In our terminology, correctly working proces- 
sors, or more briefly, working processors, are those without control faults. A 
fault-status predicate indicates whether a given processor is working or not in 
the current frame. Faults can be either permanent (i.e., hard) or transient — 
the latter is modeled by a processor whose fault-status is not working in one 
frame and working in a later one. The model does not consider the mechanisms 
by which such recovery might be achieved. 10 

• Various voting patterns are considered. In continuous voting, all state data 
are voted every frame; in cyclic voting, only the outputs of tasks in the current 
frame are voted in that frame; minimal voting uses the dataflow dependencies 
among tasks to derive conditions that vote the minimum data each frame. 
A distinguished state data item, the frame-counter is always voted at every 
frame. 

• All processors run identical workloads. The benchmark with respect to which 
fault-masking and transient-recovery results are proved is a single processor 
running the same workload that suffers no faults. 

10 Among the likely mechanisms are watchdog timers that trap to automatic re-initialization code, 
and similar reinitialization of the losers in a majority vote. In addition, the schedule table and the 
object code for the system executive and application tasks may be held in ROM, where all faults 
may be assumed hard, but also extremely rare. 
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Our model is very similar in spirit and motivation to that just described; it differs 
in being considerably more abstract. The reason for this is that we want our results 
to be as widely applicable as possible. Mechanically checked formal specification and 
verification are very time consuming to perform and mechanically checked proofs 
tend to be rather fragile. By this we mean that redoing a mechanically checked 
proof to accommodate changes to the statement of a theorem, or modifications to 
supporting lemmas, may require a quantity of effort and insight comparable to that 
required to construct the proof in the first place. Thus it is often not cost-effective to 
prove properties about a variant model by adjusting proofs from the original model. 
A generally more productive approach is to employ abstraction and hierarchy, one 
attempts to extract the essence of the problem and to prove the most general results 
possible for an abstract formulation of the problem. More concrete models can then 
be constructed as instantiations or elaborations of the abstract model, and properties 
concerning the elaborations can be proved using the abstract theorem as a lemma. 

In the present case, we obviously wish to derive results that are sufficiently gen- 
eral that they can apply to all three of the voting schedules considered by Di Vito, 
Butler, and Caldwell. We would also like them to be applicable to systems that 
make rather different basic assumptions — for example, systems in which sensors are 
sampled and actuators commanded more then once per frame, or in which not all 
cycles have identical frame schedules (so that dynamic scheduling can be accommo- 
dated). We wish to state and prove general results along the lines of provided the 
voting strategy satisfies certain properties, and providing certain fault assumptions 
are met, then an N-plex correctly masks faults and recovers from transients. 

A little thought reveals that the essence of this problem concerns the interaction 
between voting strategies, task schedules, and data dependencies. To see this, con- 
sider a particular actuator command. We want the majority value for this command 
to equal the “correct” value (i.e., that which would be produced by a single fault-free 
processor). Clearly, this will be so if a majority of processors are working correctly 
at the time they execute the task concerned and if they receive the correct input 
values. Input values either come from sensors (and our requirement here is that 
all working processors receive the same values), or they are the outputs of previous 
tasks, which may or may not have been voted. In the case of voted outputs, we 
recurse on the conditions that establish the correctness of voted outputs; in the case 
of nonvoted outputs, the requirement is that the majority were working correctly 
when that task was executed, and that their inputs were, in turn, correct at that 
point. Obviously a development along these lines must make very careful statements 
about its assumptions, and there are many tricky details to be taken care of, but 
it is equally obvious that the notions of cycles and frames are not essential to the 
argument: it is the order in which tasks are executed, the dataflow dependencies 
among them, and the placement of majority votes that determine the correctness of 
the overall scheme. 
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Thus, frames and cycles are not explicitly represented in our model: we represent 
the system workload by the dataflow dependency graph among task activations, 11 
and a record of the order in which tasks are activated. We allow voting to be 
specified for the outputs of any task activation, and we model processor failure at 
the task activation level (i.e., a given processor is either working or not working 
during a given task activation). It should be clear that a periodic, frame-based 
interpretation can be achieved by simply imposing additional structure on the task 
activation dataflow dependency graph and on the task execution schedule. (For 
example, by requiring them to have a periodic structure, allowing only one voted 
task per frame, treating failure during any task activation as a failure for the whole 
frame, and so on.) In this way, results proven for our abstract model provide a basis 
for deriving results for more concrete models relatively easily. 

In addition to cycles and frames, we have abstracted away another aspect of 
the model of Di Vito, Butler, and Caldwell: the frame-counter. Some may consider 
our use of abstraction to have been overly aggressive in this regard. Our original 
motivation was as follows. For a given processor to compute the correct outputs 
for a certain task activation, it must be working correctly during that task, and 
it must get the correct inputs. Whether it gets the correct inputs is a function of 
when data were voted, and of how long the processor has been working correctly. 
Here, “working correctly” means correctly executing the right programs at the right 
time, but on possibly corrupted data — i.e., it is the absence of control faults. We 
do not model the mechanisms by which a processor that has been not working (i.e., 
has suffered a transient control fault) gets back into the working state (i.e., recovers 
from the control fault). Part of this process may involve purging internal corruption 
(e.g., a stuck-at carry-flag) by means of a system reset, or a power cycle. Another 
part may involve reloading external state data (such as the identity of the current 
point in the task schedule — i.e., the frame counter). Surely, reloading this datum is 
simply part of the internal process of recovery from a control fault, and is therefore 
part of an activity that we have explicitly chosen not to model. 

A counter-argument to this position would observe that the only reliable source 
for such external data is the majority- voted consensus of the other processors. Thus, 
this part of the process for recovery from control faults depends on the voting 
strategy and on the mechanism for recovery from state data faults— the very core 
of what we have chosen to model. We are partly persuaded by this argument, but 
note that the data concerned differ from other state data treated within the model 
in that they are not produced and consumed by application tasks but by the system 

n A task properly refers to a particular program, viewed as a static entity (e.g., as a sequence 
of bytes, or as a function from inputs to outputs), a task activation refers to an instance of that 
program in execution. There is only one instance of each task, but it gives rise to many activations. 
Sometimes, when the context makes the intended interpretation clear, we use the shorter term task 
to mean task activation. 
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executive itself. On the other hand, we are not attracted to a special-case treatment 
of the frame counter — if other system state data needed to be recovered in a similar 
way, another special-case adjustment to the model might be needed. 

Our current preference thus remains the exclusion of the frame counter from our 
basic system model. However, the frame counter (and other state data used by the 
executive rather than by the application tasks) can be introduced quite simply and 
naturally when the model is instantiated: simply introduce a voted task (interpreted 
as the vote of the frame counter and other system state data) at the beginning of each 
frame 12 and introduce a data dependency of all other tasks within the frame on the 
output of that particular task. This last is an artifact of the model (in that no real 
dataflow need occur), but serves to establish the (control) dependency of subsequent 
tasks upon the correctness of the value for the frame counter obtained by (the task 
corresponding to) the vote on its value. The task that votes the frame counter is 
understood to be a standard task performed by all (synchronized) processors at 
frame-start time, independently of whether they (already) know what frame it is 
they should be executing. 

Although the modeling is indirect, this approach allows the properties of sys- 
tems with a voted frame counter to be derived correctly, while preserving the 
abstractness — and hence the wider applicability — of our model. Unlike special-case 
treatment for the frame counter, our approach easily accommodates more or less 
frequent voting of this value, and the introduction of additional state data that are 
required for the correct execution of the executive itself. 

In Chapter 2, we present the details of our fault-masking model in the form of 
a traditional mathematical development. 


12 In [9], all voting occurs at the end of each frame; thus, in this case, the identity of the current 
frame is recovered by the vote at the end of the previous frame. Clearly, our approach can be 
adjusted to accommodate this alternative arrangement. 



Chapter 2 

The Fault-Masking Model 


Our goal is to prove that, subject to certain conditions, an N-plex provides transient- 
recovery and fault masking for a certain class of faults. Our first requirement, there- 
fore, is a benchmark model for correct, fault-free behavior, against which the efficacy 
of transient-recovery and fault masking in the N-plex may be judged. We take as 
our benchmark a model for the behavior of a fault-free process-control system. Our 
model for an N-plex will then compose N fault-prone versions of the basic model, 
together with some voting and recovery mechanisms, and our theorem will estab- 
lish that the voted results of the N-plex equal those of the fault-free system (under 
suitable conditions). We begin by describing our model for fault-free process control. 


2.1 A Model for Fault-Free Process Control 

A process-control system manages some physical device by sending control signals 
to actuators. The values of the control signals are determined by calculations based 
on the values of sensors that monitor the device and on a record (maintained by 
the process-control system) of the state of the system. The process-control system 
is internally composed of computational tasks that are activated periodically in 
order to sample sensors, perform the necessary calculations, and send values to the 
actuators. Some tasks may also perform internal housekeeping functions. Because 
task activations may depend on the results of other task activations, there is a 
dataflow dependency among task activations that the execution schedule must take 
into account. The “slots” in the execution schedule are called cells 1 ; a process- 
control system requires a specification of which tasks are assigned to which cells, the 
dataflow relationships among cells, and the order in which cells are to be executed. 
These ideas are formalized in the following definitions. 


In a frame-based system they are often called subframes . 
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We assume 

• A set C of cells, and 

• A relation GCCx(KxC) (where N denotes the natural numbers), 

and we define 

. M d = {1,2, . . |C|}. 

Cells correspond to the activations (or executions) of tasks (to be formally de- 
fined later) or the sampling of sensors; the relation G records the dataflow dependen- 
cies among task activations associated with cells: the interpretation of (i, (n,j)) € G 
is that the output of the task activation (or sensor sample) associated with cell i 
supplies the input for the n’th argument of the task activation associated with cell 
j. A simplified relation 

• G d = {(t,j)|3n :(*,(n,i))€ G} 

captures just the basic dataflow dependencies among cells, without concern for which 
input of cell j it is that receives its data from i. We will ensure by conditions given 
later that G is a directed acyclic graph — so that there are no circularities in the 
dependencies among cells. 

Note that the set C of cells comprises all the task activations performed during 
a single run of the system (which may extend for the entire lifetime of the system). 
It is therefore potentially unbounded (though finite) in size. For many (statically 
scheduled) process-control systems, the set C and its associated data dependency 
graph G will have a repetitive structure induced by the “unrolling” of a periodic, or 
cyclic, pattern of activity. _ 

Cells with indegree zero in G are called sensor cells; those with outdegree zero 
are called actuator cells. The set of sensor cells is denoted Cs\ that of actuators is 
denoted C a- Nonsensor cells (including actuator cells) have a computational task 
associated with them and are called active-task cells. The set C \ C s of active-task 
cells is denoted Cj- 

Each task activation (or sensor sample) generates a value that is either com- 
municated to an actuator or stored so that it will be available as input to later 
task activations. The system state records these stored output values. Formally, we 

define 

• A set D of domain values, and 

• A set of states S C C — ► D. 

The data values computed, stored, and manipulated by the system are assumed 
to be drawn from the uninterpreted domain D. The system state is represented by a 
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function from cells to this domain: if o 6 S is the instantaneous state of the system, 
and c is a cell, then cr(c) denotes the output value stored for that cell. It may seem 
that a system satisfying this description must have a huge amount of storage in order 
to record the values of all task activations for all time. This is not so. Anticipating 
definitions that are given below, we observe that tasks are executed in a sequential 
order that respects the dependency ordering represented in the graph G, and run 
to completion. There is no need to record a value for a cell that has not yet been 
executed, nor for one whose immediate successors in the relation G have already 
completed. Although this result is intuitively obvious, its formal verification is an 
interesting exercise (see page 47). 

Formalizing the notion of sequential execution, we introduce 

• A bijection sched: M — ► C, with 

• Inverse when : C — ► M . 

The interpretation here is that the i’th task execution (or sensor sample) is the 
one associated with cell sched(i); conversely, the activity at cell c is the tu/ien(c)’th 
to be executed. We require that the order of execution respect the dataflow depen- 
dencies recorded in G: 


(hj) € G D when(i) < when(j). 

Notice that this requires that G is acyclic. 

Active-task cells have some computational task associated with them, so we 
require 

• A set T C S —► D of task-functions , and 

• A function task : Ct —> T . 

When an active-task cell c executes, the function task(c) is applied to the current 
state, say a, yielding the result task(c)(cr). This is then stored in the system state 
as the value of cell c to yield a new state r. That is, 

r = c r with [c := task(c)(o)] 

where with [. . .] denotes function modification (as in Ehdm ). 2 The only compo- 
nents of the system state that may influence the result are those of the immediate 

2 The notation / with [x := a], where x is a value in the domain of / and a a value in the range, 
denotes a function with the same signature as / defined by 

/ with [x := a](y) = if x = y then a else /(x). 
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predecessors of cell c in the dataflow dependency graph G. 3 Formally, we state this 
as a requirement that the result be functionally dependent on just those values: 

(V(a, c) 6 G : a (a) = r(a)) D task(c)(<r) = task(c)(r). 

Sensor cells store their results in the system state just like active-task cells. 
However, they take no input from the system state; instead, they sample properties 
of the external environment (including control inputs). These properties vary with 
time, so it might seem that sensors should be modeled as functions of real-time. In 
fact, this is unnecessary and inappropriate, since our model is not concerned with 
real-time properties such as absolute execution rates, but with those of sequencing 
and voting. We want to prove that if an N-plex gets the same sensor samples as 
an ideal fault-free system, then it will deliver the same actuator commands (despite 
the occurrence of faults). Thus, we need only model the sensor samples actually 
obtained, which can be done by modeling sensor samples as functions of position in 
the execution schedule (i.e., we use the number of cells executed as our notion of 
“time”). Thus we introduce 

• A set S C M — ► D of sensor-functions , and 

• A function sensor : Cs — ► S. 

When a sensor cell c executes, the sensor- function s = sensor(c) samples the 
environment (at time whence)) to yield the value s(u>/ien(c)). This is then stored 
in the system-state as the value of cell c. 

Formally, the execution of cells is modeled by the function 

• step: S X C — ► S 
where 

step(a,c) = f a with [c := if c € Cs then sensor(c)(when(c )) else task(c)(a)} 

is the new state that results from executing the task of cell c in state a at time 
when(c). 

We are interested in the state after the system has executed some number m of 
cells according to its schedule. This is modeled by the function 

• run: M —* S 

3 Operation ally, the function task(c) is applied to the tuple of values 

(<7(ci),<t(c 2 ), . . . ,<r(c n )) 
where (c,, (t, c)) € G and n = i ndegree(c). 
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where 


run( 0) G <S, 

run(m + 1) d = step(run(m ), sched(m + 1)). 

A variant is the function 

• runto : C S 
where 

runto(c) run(when(c)) 

is the state of the system when execution of its schedule has reached cell c. Observe 
that run(0), the initial state, is chosen arbitrarily. 

2.2 The N-plex Model 

In this section, we admit the possibility that machines may fail and we introduce 
replication and voting to overcome that fallibility. 

We assume a replicated system comprising r > 3 component systems of the type 
described in the previous section and we define 

• R = f {1,2, . . .,r}. 

In the following, we will often refer to the component systems as “machines.” 

Component machines may fail and revive independently; at any time a machine 
is either “failed” or “working.” This is specified by a function 

• T\R-+ (M — ► {T,F}) 

where J^(i)(m)is T just in case component machine i is failed at time m. 4 Intuitively, 
a component machine i is failed at time m if it suffers a control fault at any point 
during execution of the task scheduled at time m. We know nothing at all about the 
behavior of failed component machines. Working (i.e., non-failed) machines correctly 
compute the function associated with the task scheduled at time m. However, the 
result computed may be incorrect if an earlier failure has caused the input data to 
be bad. A machine that is working correctly, but on bad data, has state data faults 
that will eventually be overcome through majority voting of state data. 

States of the replicated machine are drawn from the set 

• TZCR-^S. 

4 A function with range {T> F] can be interpreted as the characteristic predicate of a set (this 
is how sets are defined in Ehdm). Thus F(i) can be interpreted as the set of times when the i’th 
machine is failed during execution of the cell scheduled at that time. 
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Thus, if p £ TZ is a replicated state, then p(i) is the state of the i’th component 
machine, and p(i)(c ) iS the value of cell c in that machine. 

The components of a replicated machine behave much like a single machine, 
except that components may fail, and so they periodically vote their results. Thus 
we assume a set 

• Cv of voted cells 
and require 

C A CC V C C T 

(that is, all actuator cells are voted, but no sensor cells are). 5 

Each execution step in the replicated machine takes place in two stages. In the 
first stage, each working component machine performs a single (ordinary) step. This 
is specified by the function 

• sstep: TZ x C TZ 
where 

~>F{i)(when(c)) D sstep(p,c)(i) = step(p(i),c). 

This definition states that a working component machine updates its own state in 
exactly the same way the unreplicated system model would, given the same state. 
Two important consequences of this definition may not be obvious: 

• If cell c is a sensor cell, then the value of step(p(i),c ) is 

p(i) with [c := sensor(c)(when(c))] 

(this comes from the definition of step). Note that the expression in the 
with clause is independent of the machine t; thus, as noted above, our model 
requires that all working machines get exactly the same sensor samples. 

• If machine i is failed when execution of cell c should be performed, we 
know nothing whatsoever about the subsequent state of that machine, i.e., 
sstep(p, c)(t). We do not assume merely that the value stored for cell c could 
be incorrect; we allow the whole state (of that machine) to be damaged or 
destroyed. 

When a voted cell is executed, the working component machines each calculate 
the majority vote of the full set of all their individual results. This is specified by 
the function 

5 Sensor cells are not voted because we assume an underlying Byzantine fault-tolerant distribution 
mechanism which ensures that all working machines get the same sensor samples. This assumption 
is captured in the definition of the function sstep. 
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• vote: 71 x C —> 71 
where 


-•n i)(aken(c )) 3 votclp, c){.) = ,,n) with [c := raoj|Kj)(»)|j € /<}], 

maj is the “majority" function, and {p(j)(c)\j e *} denotes the bag (multiset) of 
values recorded for cell c by all the component machines. 6 

As with the sstep function, we know absolutely nothing about the state of a failed 
component machine after a vote in which it should have participated. Another inter- 
esting element of this definition is that all working machines are specified to perform 
a majority vote on the same bag of values: this suggests they must not only read 
each other s values correctly, but they should agree on the values attributed to faulty 
components. These are precisely the requirements that “Byzantine agreement” (also 
nown as interactive consistency”) algorithms are required to satisfy. It may seem 
herefore, that any realization of this model should employ a Byzantine agreement 
gorithm to distribute the values to be voted among all of the component machines 
This is unnecessary however, since it is a majority vote that is being computed, and 
our results will establish that the good values comprise a majority. Thus, the values 
ascribed to failed processors are irrelevant, and the working processors do not in 
fact, need to agree on those values. We do not prove this result here; we regard it 
as a proof obligation on the implementation. 

The overall behavior of the replicated machine is specified by the function 

• rstep: 71 x C -+ 71 

which is simply the appropriate combination of the two steps above: 

rstep(p, c) = f / vote (^tep(p, c), c) if c 6 C v 
\ s step(p, c) otherwise. 

Functions rrun and rrunto are defined analogously to the single machine case: 7 

• rrun: M -> 71, 


6 Note that maj is a partial function: it is undefined if an absolute majority of components do 
A°f<if and 0 " a °r rCS “ ltS WlU alWayS Uke Care to estabbsh conditions in which it is defined 

Moore during' the ll^FT £^[57] the majority funCtion Was dlscov « ed Boyer and 

7 Readers unfamiliar with higher-order logic may find the, so-called “Curried,” functions that 
nref eni fl. 0y . at s ‘ ran 6 e - Rather than the Curried application rrun(m)(i)(c), they might 

prefer the application of a function with multiple arguments: rrun(m, i,c). The advantage of our 
approach is that the separate components of the application have individual meaninTlnd can b 
iTthe stat ; n f ‘“ y: rr “ n(m) 18 the State ° f lhe Seated machine after m steps, rrl( m )(,) 
cel? t in that C ° mP ° nent maChine at that P ° int ’ and rr « n (”»)(‘j(c) is the value stored for 
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is given by 


rrun(O) 
rrun(m + 1) 


def 

def 


(\i: run( 0)) 

rstep(rrun(rn), sched(m + 1)) 


and 


# rrunto : C —> Tl 


by 


rrunto(c) = f rrun(when(c)). 


Notice that our model assumes that computation and voting are atomic an 
the components of the replicated machine are completely synchronous. These are 
idealizations of reality and we intend to explore more realistic assumptions m jatei 
work. They are adequate, however, for the purpose of the current investigation, 
where we are primarily concerned to develop the conditions under which majority 
voting successfully masks transient failures. 

2.3 Fault Tolerance and Transient-Recovery 

Our goal in this section is to show that, under certain conditions concerning the 
failure “pattern” T , the replicated machine produces the same actuator behavior as 
the single machine, despite failures among the components of the replicated machine 
Our requirements are that the majority- voted value for each actuator should be 
correct value-that is, the value produced by a single fault-free system. In our 
model, actuator cells are voted, so that any nonfaulty component machine wi se 
its own value for an actuator cell to that of the majority. Thus, the oonecM 
statement can be rephrased as the requirement that the value computed for an 
actuator cell by any nonfaulty component machine should be the correct value. 

We can state the condition that a component machine t have the correct value 

for cell c in terms of a predicate. 

• good-value: R X C — * {T, F} 


where 


good-value(i,c) d = rrunto(c)(i)(c ) - runto(c){c). 


We then seek a predicate 
• safe: C — ► {T, F] 
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such that 


Vc £ C A ,i 6 R : (safe(c) A -> T{i)(when(c ))) D good-value(i,c). 

Intuitively, safe(c) will capture the conditions under which the replicated machine 
as enough working components, and those components have been working for long 

enough since their last failure, that good values form a majority and faults will be 
masked successfully. 

If only actuator cells were voted, it would be trivial to derive the required result: 
safe(c) would be the condition that a majority of components have been working 
continuously since the very first cell through the computation and vote of cell c. That 
this condition is sufficient follows from the fact that working component machines 
given the same inputs produce the same results as each other; failed machines can 
produce anything (including nothing). Thus, the continuously working machines 
will agree among themselves at every voting stage and, since they are hypothesized 
to be in the majority, leave their states unchanged. Since actuator cells are voted, 
any machine that is working during the vote of an actuator cell will acquire the 
correct value from this continuously- correct majority. 

To see that this condition is necessary, suppose that there has not been a ma- 
jority of components working continuously since the beginning. Then a majority of 
machines have failed at some time or other prior to the execution of cell c. When 
t ey failed, they may have destroyed their system state. Since we are now assuming 
no votes other than at actuators (and actuators do not provide input to other cells) 
this corruption may persist even after a failed machine starts working again. Thus a 
failed machine cannot be guaranteed ever to recover fully. Since these machines are 
hypothesized to form a majority by the time cell c is executed, they could outvote 
the good machines at that point. 

Without intermediate voting of state values, a component machine that suffers a 
transient failure may never fully recover, since there is no way for it to repair its state 
ata. Intermediate voting can allow this repair to take place, so that the conditions 
m the predicate safe become less Draconian. There are many possible strategies for 
intermediate voting: we can vote at every cell or only at certain cells, and we can 
vote the entire state, or just some portions of it, or just the value computed at that 
ce . oting more data or voting more often than required can be very expensive 
using up resources that could be put to better use. Early DFCS maintained very 
little state data and it was feasible to vote the entire state every frame. Modern 
systems maintain much more information and it is necessary to be more sparing 
in the frequency of voting, and in the quantity of data voted. Obviously there 
is a trade-off here: voting less frequently, or less data at each vote, may increase 
the time taken to recover from transients, and thereby reduce the reliability of the 
system. Clearly, overall reliability depends upon the relationship between the voting 
strategy, the fault arrival rate, and the dataflow dependencies in the system We 
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need to encode this relationship as the condition in the predicate safe. Intuitively, 
the condition must ensure that, for every cell, a majority of machines have been 
working for long enough since their last failure that they have acquired correct 
values (from sensor samples or votes) for data values that ultimately contribute to 
the value of cell c, and have computed all intermediate values correctly. Stating this 
condition formally requires some additional definitions. 

We define 


foundation: C - P(C), where V denotes powerset, 


recursively as follows: 


foundation(c) 



if c E ( Cs U CV) 
foundation(b ) otherwise 


(b,c)eG 


and 


by 


support: C — * V(C) 


support(c) *=* 


{ c | u \^j foundation(b) if c E Cv 
(b } c)£G 

foundation(c) otherwise. 

The foundation of a cell c consists of all those cells that directly or indirectly con- 
tribute input data to c by a path that does not pass through any (other) voted cells. 

Note that a voted or sensor cell is its own foundation. . . 

Figure 2.1 gives a graphical representation of these concepts. In the figure, circles 
indicate cells, double circles indicate voted cells and the arrows indicate dataflow 
dependencies (the arrow from cell D to cell A represents the arc (A I)) € G the 
direction of the arrowhead indicates the dependency relation, rather than the flow 
of data). The left to right position of cells on the page suggests the order in which 
they are executed. In this case, the foundation for cell J is just .W J 

a voted cell), that for A is {A} (since A is a sensor cell), and that for cell D 

^ The support for a nonvoted cell is simply the foundation for that cell; the support 
for a voted cell is the union of the foundations of all the cells that directly provi e 
input to that cell. The intuition here is that if a machine computes correct values 
for all the cells in support (c), and if the machine keeps working, then the v^ue 
eventually computed for cell c will be correct. In Figure 2 1 the -PPorts for A and 
D equal their foundations, whereas the support for the voted cell J is {A C, U,J). 
A machine that is working throughout the support of cell J will compute the correc 
value for that cell: since it is working at sensor cell A, it will acquire the correc 
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Figure 2.1: Example Dataflow Dependency Graph 

sample value from that sensor; since it is working at voted cell C, it will acquire the 
correct value for that cell during its majority vote, even if it had been failed earlier 
and had not computed the right value itself 8 ; since it has the correct input values 
for cell D and is working at that cell, it will compute the correct output value; and 
since it has (from D) the correct input value for cell J, and is working at J, it will 
compute the correct value for J. 

We need just a few more definitions. The function 
• committed- to: C — ► M 
is defined by 


committed-to(c) = min{when(a)\a 6 support(c)}. 

In the example of Figure 2.1, committed-to(J) = when(A). Once a machine reaches 
committed-to(c) in its schedule, it must keep working until when(c) if it is to compute 
the correct value for cell c. Conversely, if it does keep working throughout this 
period, it will compute the correct value for cell c even if its own state data are 
corrupt at the beginning of the period. This is because all the data required to 
compute cell c are derived either from sensor samples, or from voted values, that are 
acquired at or later than committed-to^c). Thus, provided enough other machines 

We are assuming here that enough machines were working correctly at c that correct values 
form the majority. We cannot give a characterization of the necessary condition yet, since we are 
in the process of developing the concepts that make its statement possible. 
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are working, this machine will acquire good values during the votes and sensor- 
samples and its own bad state data will not contribute to the result. 

The function OK captures the condition under which a particular component 
machine has been working for “long enough” since its last fault that any bad state 
data values have been replaced by good values through votes and sensor samples-so 
that it is able to compute a good result for the current cell. Thus, 

• OK: R —*■ {C - { T,F }) 
is defined by 

OK(i){c) d = (Vm : committed-to(c) <m< when(c) D -oF(i)(m)). 

In other words, 0 K(i)(c) is the condition which ensures that component machine i 
has no state data faults that can affect the value computed for cell c. 

For the replicated machine to be safe, a majority of its components must be OK 
for every cell. We therefore introduce the function 

• MOK:C —* {T,F} 

(for Majority OK) defined as follows 

MOK(c) = f 30 C R, | 0 | > r/2 : i € 0 D OK(i)(c). 

We then define the predicate safe as follows 

safe(c) A = (Va : when(a) < when(c) D MOK(a)). 

That is, the replicated machine is safe at cell c if, the condition MOK holds at c 

itself and at all cells evaluated earlier than c. 

Now we can state and prove our main theorem. This “Consensus Theorem is 

similar to lemmas of that name in [9]. 

Theorem 1 (Consensus Theorem) If safe(c), then 

Vi € R : OK(j)(c) D good-value(j,c). 

Proof: The proof is by strong induction on when(c). The basis is the case 
when(c) = 1, in which case c must be a sensor cell, and so 

rrunto(c)(j)(c) = sensor(c)(l) = runto(c)(c) 


as required. „ , . 

For the inductive step, suppose the theorem true for all cells a such that 

when(a) < when(c) and let j be a component machine such that Oh(j){c). I 


2.3. Fault Tolerance and Transient- Recovery 


29 


c £ Cs, the argument is the same as for the basis case, and so we consider c £ Cj 
and consider a such that (a,c) £ G. Since the result of c is a function of its inputs, 
the result will follow if we can demonstrate 

good-value(j , a). 


There are two cases to consider. 

Case 1: a £ Cy . It may not be that OK(j)(a) and so we cannot appeal to the 
inductive hypothesis directly, but we do know that MOK(a) and hence that 
a majority of machines exemplified by k (possibly not including j) satisfy 
OK(k)(a). By the inductive hypothesis, good-value(k,a) for these machines. 
Now, we hypothesized OIi(j)(c) and hence -tj F(j)(a). It follows that during 
the voting stage of the execution of cell a, machine j will acquire the majority 
value for that cell, i.e., good-value(j, a), as required. 

Case 2: a £ Cy. A component machine i is OK for cell c if it is working throughout 
the period from committed-to(c) to when(c). Observe that the support of a 
nonvoted cell a is a subset of any cell c to which it provides input. It follows 
that committed-to(a) can be no earlier than committed-to(c). We must also 
have when(a) < when(c). Thus OK(i)(c) 3 OK(i)(a ) and the result then 
follows directly from the inductive hypothesis. 


□ 

The result we seek follows from the Consensus Theorem: 

Corollary 1 For c £ Ca, if safe(c) then 

Vi £ R : -i F(i)(when{c)) D good-value(i y c). 

Proof: The statement of the corollary implies MOK(c) y so there must exist j £ R 
such that OK(j)(c). The Consensus Theorem then supplies 

Vj £ R : OK(j)(c) 3 good-value(j, c) 

which, on expanding the definition of good-value , gives 

rrunto(c)(j)(c) = runto(c)(c). 

Now c £ Ca, so c is a voted cell, and the definition of the voting function ensures, 
Vi, j £ R : {^F(i)(when(c))^T(j){when(c))) 3 rrunto(c)(i)(c) = rrunto(c)(j)(c ), 
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since all working machines acquire the majority value as the result of voted cells. By 
definition, OK(j)(c) D T(j)(when(c)). Hence, for any i € R such that -i^ r (t)(c), 

rrunto(c)(i)(c) = rrunto(c)(j)(c ) = runto(c)(c ) 
and we conclude good- value (i,c) as required. □ 

In words, the corollary states that each working component of the replicated 
machine computes the correct value for an actuator if a majority of machines is 
working throughout the period from commiUed-to{c ) to when(c ) for each cell c in 
the schedule up to and including the actuator concerned. 

In Chapter 3, we consider the formal specification of this model in Eh dm, and 
the mechanically- checked verification of the results derived above. 



Chapter 3 


Specification and Verification 
in EHDM 


In this chapter we give an overview of the formal specification and verification in 
Eh dm of the model presented in Chapter 2. It is not our purpose to provide a general 
introduction to Ehdm here; readers unfamiliar with the Ehdm language and system 
are re erre to von Henke and Rushby [7]. Our purpose is rather to discuss some of 
e more interesting issues raised by the formalization, and to provide a road map 
to the complete listings of the Ehdm specification and verification, which are given 
m the Appendices. The lAT E X-printed Ehdm specification is given in Appendix 
A; a cross-reference from identifiers to the module in which they are declared is 
given in Appendix B; Appendix C reproduces the summary from the Ehdm proof- 
chain analysis for the result corresponding to Corollary 2. All the material in the 
Appendices was generated directly by the Ehdm system. 

Since the specification language of Ehdm is a rather rich, strongly typed higher- 
order logic it was possible to cast the model presented in the previous chapter into 
Ehdm fairly directly. The specification of the basic process-control model is given 
in the module simple^nachine (page 59). The semantic subtypes of Ehdm allowed 
us to specify the various types of cell in a very natural and convenient manner, 
for example, C T the type corresponding to the active-task cells is specified as the 
su ype ° (the type of all cells) satisfying (Ac : celljype(c) / sensor. cell). We 
can then define the signature of the function task as C T - taskJn and the Ehdm 
system will ensure that applications of the form task(c) occur only in contexts where 
c can be proven to satisfy the subtype predicate for C T . These proof obligations 
are called type-correctness-conditions (or tec’s for short) and are placed in system- 
generated modules whose names end in _tcc. The definition of the function step for 
example causes two such tec’s to be generated in the module simplejnachine.tcc 
(page 62). This latter module contains several other tec’s, including three that 
are required to demonstrate the nonemptiness of the subtypes introduced, one that 
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is necessary to demonstrate the well-foundedness of the recursive definition for the 
function run, 1 and two others that are similar to those just discussed for step. Ehdm 
provides a tool called the proof-chain analyzer that checks whether a verification is 
complete. Among the conditions that it enforces is the requirement that all tcc s be 

P System-generated tcc modules automatically include trivial proof declarations 
for the formulas concerned. When these automatically generated proof declarations 
do not suffice to establish their corresponding theorem, the user must construct 
more elaborate proof declarations in another module. (Being system generated 
and crucial to the type-correctness of the specification, tcc modules are protected 
against modification by the user.) The three such declarations needed m this case 
are given in the module simple jnachine_tcc_proofs (page 64). A similar naming 
convention is applied to other modules containing proofs for tcc s In order to 
satisfy the nonemptiness requirements on subtypes, we introduce three constants 
corresponding to an arbitrary sensor, actuator, and active-task cell respec ive y 
(strictly, the last of these is unnecessary— actuators are also active tasks). In any 
application of the specification, instantiations for these constants must be supplie_ 
We do not define the relation G in the Ehdm specification; the simpler relation G 
is sufficient to state and derive all the results required. We introduce initial .state 
as an arbitrary constant of type state to serve as the initial value in the recursive 

definition for the function run. . 

The rest of the specification in module simple jnachine is a fairly direct translit- 
eration of that given in Section 2.1, with one exception: the Ehdm specification has 
an extra argument for the function step. This was intended to allow for the descrip- 
tion of systems with a less rigid scheduling model than that eventually employed. 

Thus, whereas Section 2.1 has 

step(o, c) = o with [c := if c € C s then sensor{c)(when(c )) else task{c)(i r)], 
the Ehdm specification has 

step(o,c,m ) = a with [c := if c G C s then sensor(c)(m) else task{c){o)}. 

the form 


m 


However, this latter version of the function is always used 
stepio, c, when(c)), so that it is equivalent to the first version. 

The module simple.props (page 68) states and proves some simple consequences 
of the previous definitions that are needed later. One, stay.correct^imple, is an 
example of the type of condition that is often glossed over m conventional mathe- 
matical presentations, such as that in Chapter 2. It states that if the output of ce 

‘The annotation “...by identity” in the recursive definition of run establishes identity as 
the measure Junct.onior the recursion. The value of the measure function ,s required to strictly 
decreasing across recursive calls, and a tcc is generated to ensure that this is so. 
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a is used as an input to cell c, then the value recorded for a immediately after it is 
computed will still be the same when it is accessed (possible much later) in order 
to be used in the computation of c. In the case of the simple machine, this result 
is straightforward; it is less so in the case of the replicated machine (since failures 
must be accounted for). In either case, this is the step that will require a modified 
proof if the specification is adjusted to model systems that do not keep all cell values 
for all time (see page 47). 

The proof of stay_correct_simple is by induction. The particular form of 
induction used is a variant of simple induction over the natural numbers. This 
is stated as the higher-order theorem induct ion jn in the module nat induct ion 
(page 66). This module states two other induction schemes; all three are derived 
from a statement of Noetherian induction given by the axiom general_Lnduction 
in the module noetherian (page 65). Note that general_induction is the only 
induction scheme stated as an axiom; all the others are theorems derived from this 
single axiom. Notice, too, that the module noetherian has assumptions (stated in 
the assuming clause) that must be discharged in any instantiation. The module 
natinduction discharges these assumptions for its particular instantiation. 

The next three modules, sets (page 71), cardinality (page 72), and 
orderedsets (page 74), introduce concepts related to sets that are needed in order 
to state the model for the replicated machine. Sets are modeled by their character- 
istic predicates; the type of (the predicate representing) a given set is dependent on 
the type supplied as the actual parameter to the sets module. The sets module 
defines the basic set operations of union, intersection, subset, and the like, as higher- 
order functions. Those unfamiliar with the use of higher-order logic in specifications 
may find these definitions particularly interesting. 

The module cardinality introduces the notion of the cardinality (size) of a set 
and defines some of its properties axiomatically. Some of the axioms we use, for 
example 

\ a U 6| + \a n 6| = |a| -f |6|, 

are valid only for finite sets. Accordingly, an assumption is attached to this module 
to ensure that only finite types may be supplied as its actual parameter. The 
Eh dm proof-chain analyzer checks that module assumptions are discharged in any 
instantiations before the overall verification is declared complete. 

The module orderedsets defines the function min (the value of the smallest 
element) on sets whose elements are drawn from a type with a suitable ordering 
relation. 

The replicated system model is developed in the module repl_machine (page 75). 
The specification follows very closely that given in Section 2.2. As with the step 
function of simple_machine, the functions vote, sstep, and rstep all take a third 
argument in the Eh DM specification, but are always used in a manner that is consis- 
tent with the two-argument forms given earlier. Another slight difference is in the 
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specification of the condition that majority voting is performed only for voted cells. 
In the Eh DM specification, this is given in the axiom for the vote function, rather 
than in the definition of rstep. The two approaches are obviously equivalent, but if 
we were to revise the Ehdm specification, we would change it to the alternate form 
used in Section 2.2. The form currently employed suggests that the voter is always 
applied, but only actually does a vote when the cell is a voted one; it would be more 
natural to specify that the voter always votes, but is applied only when the cell is a 
voted one. 

The required property of the maj (majority vote) function is specified in the 
axiom maj_ax. Note that by specifying this function relative to a set of component 
machines, rather than relative to the values recorded by their states, we avoid the 
need to introduce the concept of a multiset. The majority vote function used in 
Section 2.2 is a partial function: it is undefined if an absolute majority does not 
exist. Functions in Ehdm are total, however: the maj function, for example, has 
some value even when an absolute majority does not exist— we simply know nothing 
about what that value may be. In order to make use ofmaj.ax, the verification must 
always establish that the conditions for the existence of an absolute majority are 
satisfied. Thus the distinction between a truly partial function and a total function 
whose values are unconstrained when applied outside its domain is moot in this 
case. 2 

Module supports (page 79) introduces the functions foundation, support, and 
committed_to that are needed in the statement of the Consistency Theorem. Sub- 
sidiary functions backup and critical_times are used in the definitions. 

The module correctness (page 84) defines the functions OK and MOK, the predi- 
cates safe and correct, and states thejresult, which corresponds to the Consen- 
sus Theorem, and is the main result proved in the verification. The definition for 
safe given in the Ehdm specification is weaker than that given in Section 2.3, and 
so thejresult is stronger than Theorem 1 of Section 2.3. The difference is that the 
formal specification of safe(c) requires only that the replicated machine be MOK 
for those cells a that transitively contribute input to c; the definition in Section 2.3, 
on the other hand, requires that the replicated machine be MOK for c and for all 
cells executed earlier than c. Clearly the cells that transitively contribute input to 

2 If J. A Z B is a partial function and x € A a value outside the domain of definition of /, 
then the term /(*) has no meaning. There are two ways to capture the useful properties of partial 
functions in Ehdm: one is to use a total function with signature A — B, but to specify nothing 
about its values outside its domain of definition. In this case, the term /(*) has some value, but 
we don’t know what it is. Expressions like x = y D f(x) = f(y) are meaningful, and true, however. 

The other approach is to use a total function with signature A — ► B where C 
is the true domain. The quotient function, for example, is defined this way m EHDM: 
quotient: function (number, nznum — number], where nznum is the type of nonzero numbers de- 
fined as a subtype of numbers by the predicate (Ax: i * 0). In this case, the term quotient ( x, y) is 
type-correct only if it can be proved that y # 0 in the context of its use. 
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c must all be executed earlier than c, and so the second condition implies the first. 
The reason we used a stronger definition for safe in the traditional mathematical 
presentation than we did in the formal specification is that the stronger definition al- 
lows Theorem 1 to be proved by simple induction over the natural numbers, whereas 
the weaker definition requires a proof by Noetherian induction over the structure of 
the dataflow dependency graph. Noetherian induction is rather tricky to state and 
carry out in quasi-formal notation (and may not be familiar to all readers) and so 
we opted for the stronger notion of sa/e, and hence a weaker theorem, in the tra- 
ditional development. In the truly formal notation of Ehdm, it is no more difficult 
to perform Noetherian than simple induction, and so we used the definition for safe 
that gave the strongest theorem. 

The module connect (page 87) establishes a crucial lemma called stay .correct 
which states that if a is a cell that provides direct input to cell c, and if all component 
machines that were OK at a computed the correct value for a, and if the replicated 
machine is safe at c, then all component machines that are OK for c will have the 
correct value for a available when they execute c. The proof of this lemma involves 
a subsidiary lemma called stay .correct jrepl that is the analog, for the replicated 
machine, of the stay_correct_simple lemma discussed earlier. Like the earlier 
lemma, this one is proved by induction, but requires a more complex induction 
scheme than the previous case, because the induction must not proceed beyond the 
point to which the component machine is known to be OK. 3 

A key step in the proof of stay.correct is provided by the lemma 
torch.carried, which establishes that if cell a provides input to cell c, and if the 
replicated machine is safe at c, then there is some component machine that is OK 
at both a and c (and hence it “carries the torch” of correct values over from a to 
c). The proof of this property is the one place where we depend on the fact that we 
are using majority voting (and hence that the intersection of the sets of component 
machines OK at a and OK at c must be nonempty). 

The three modules sensor_step (page 91), nonvoted_step (page 94), and 
voted.step (page 97) establish the three cases for the inductive step in the proof of 
the_result (i.e., Consensus Theorem) in module correctness-proof (page 103). 
Unlike the traditional-style proof for the Consensus Theorem given in Section 2.3, 
where strong induction over the schedule of cell executions is employed, the veri- 
fication in Ehdm uses Noetherian induction on the dependency structure recorded 
in the relation G . This is the most natural induction scheme to employ in this 
case and, as noted earlier, allows a stronger formulation of the theorem. Since the 
statement of the Consensus Theorem has the form of an implication, we actually 

3 The type C of cells is implicitly defined to be infinite by the module simplejnachine, since the 
when and sched functions constrain it to be bijective with the naturals. The specification would be 
improved if the bijection were established with a finite initial subset of the naturals. In this case, 
the inductive proof of stay_correct_simple would also require revision. 
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employ a specialized form of Noetherian induction called mod-induction that is tai- 
lored to this case. The statement and proof of mod-induction appear in the module 
noetherian. 

The three modules concerned with the establishing the inductive step for the 
proof of thejresult each prove a lemma which states, for the case of the cell c 
considered (i.e., a sensor cell, a nonvoted active-task cell, and a voted cell, respec- 
tively), that if the replicated machine is safe at c, and correct at all cells a that 
provide input to c, then the replicated machine will be correct at c. The proofs of 
these results essentially follow from applications of the definitions of the functions 
step, sstep, vote, rstep, rrun, and rrunto, but are somewhat tedious in Ehdm 
since its theorem prover lacks a rewriter: numerous lemmas are required to break 
the proof down into manageable pieces, each involving the application of just one 
or two definitions. 

Finally, the module outputs contains the specification and proof for the formula 
actuators -correct, which corresponds to Corollary 1 in Section 2.3. 

The complete verification of the_result requires the mechanized checking of 93 
proofs (in addition, there are 9 automatically generated tcc proofs that fail; these 
are supplanted by successful proofs among the 93) and takes about 7 minutes on 
a Sun SPARCstation 2. The terse proof-chain analysis for thejresult is given in 
Appendix C. The effort required to formally specify and verify the model in Ehdm 
was between three and four man-weeks. 


Chapter 4 

Reconciliation with the LaRC 
Model 


In this chapter we explain the connection between our model and that developed 
by Di Vito, Butler and Caldwell of NASA Langley Research Center (LaRC) [9]; for 
brevity, we will generally refer to this as the “LaRC model.” 

A major difference between our model and the LaRC model is that we allocate 
the elementary units of activity to a single-level structure of cells, whereas the LaRC 
model considers a hierarchy of subframes, frames and cycles (in ascending order). 
Thus, in our model, cells are drawn from a simple type C, whereas in the LaRC 
model the units of activity (which we will call “LaRC-cells”) are represented by 
triples which we write as \pj,s], where p is the cycle, 1 / is the frame, and s is the 
subframe. There are an indefinite number of cycles, M frames, and frame / has 
Mj subframes. If we let K* denote the first k natural numbers, then we require 
P € N,/ £ and s 6 K^. 2 The sequence of frames repeats to form cycles; 
hence the properties of the LaRC model are primarily specified in terms of the last 
two components of the LaRC triples. Dataflow dependencies are represented by a 
relation — *• on these pairs, where 


[/> s ] -*■ [<M] 

means that subframe s of frame / supplies input to subframe t of frame g. If 

f>gV(f = g As>t) 

1 We use the variable p, suggesting period, rather than c, suggesting cycle, to avoid confusion 
with c as a cell. 

2 This is an example of a dependent type: a type that depends on the value of a variable. Ehdm 
has dependent typing, but lacks a syntax for stating the product type required here. A more 
advanced specification language under development at SRI permits this type definition to be stated 
directly. 
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then the input comes from [/,s]’s execution in the previous cycle. The directed 
graph associated with — + is called the task graph. 

The second major difference between the LaRC model and ours is that we as- 
sociate voting with individual cells, whereas the LaRC model treats voting as a 
separate activity performed at the end of each complete frame. The LaRC model 
employs a predicate VP (for Voting Pattern) to indicate what results are to be voted 
in each frame: VP(f,s,g) is true just in case the result of subframe s in frame / is 
voted at the end of frame g. 

The association of votes with frames in the LaRC model renders it strictly weaker 
than our model: we can model any system that can be represented within the LaRC 
model, but we can also model systems (for example, those having votes elsewhere 
than at the end of the frame) that cannot be represented within the LaRC model. 
In order to substantiate the first of these claims (the second is self-evident), we now 
indicate how the LaRC model can be represented within our formulation. 

To do this, we introduce a new “voting” cell at the end of every frame in the 
LaRC task graph and, to a first approximation, we add an arc to the task graph 
between each (regular) cell and the voting cell of the frame that votes that cell s 
value; we also replace those dataflow references to the value of the original cell made 
by cells scheduled in frames later than one that votes its value by references to the 
value of the voting cell. We say “in principle” because the process is complicated 
when a value is voted by more than one frame. In this case, the voting cells of the 
later frames vote on the previously voted value, not on the value of the original cell; 
similarly, any references to the value always retrieve the most recently voted version. 
(This is because there really is only one copy of the value). 

Figure 4.1 gives a pictorial representation of the transformation just described. 
In the figure, vertical dashed lines indicate frame boundaries, and the left to right 
order of cells on the page suggests their temporal order of execution. The top image 
portrays an unvoted system with three frames and two subtasks in each frame; the 
numbered arcs indicate the dataflow dependencies. The lower two images portray 
the system after transformation to frame-based voting systems. The double circles 
represent the new voting tasks and the unnumbered arcs that curve below the line 
of circles represent the dataflow dependencies of these new voting tasks. The middle 
image portrays “continuous voting” (see Section 4.1.1), in which all data are voted 
every frame — hence each voting task has a link back to the previous voting task in 
order to access the previously voted values of earlier tasks. Arcs corresponding to the 
original dataflow references retain the same numbering scheme in this transformed 
portrayal. Observe that arc number 7, for example, no longer reaches back to 
a task several frames earlier, but only to the previous voting task. The bottom 
image portrays the system after transformation to a frame-based voting system 
using “cyclic voting” (see Section 4.1.2), in which each frame votes only the data 
generated in that frame. Here, arc 7 must still reach back to the frame containing 
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the task of interest, but the data is acquired from the voting task of the frame 
concerned. 

Formal description of the transformation is complicated by the need to take care 
of the details. We identify the cells of our model with the triples [p, /, s] of the 
LaRC task graph, together with an initialization cell and the special voting cells; 
we denote the initialization cell by c/, and the voting cell at the end of frame g of 
cycle q by g y The basic dataflow connections — ► of the LaRC task graph give 

rise to edges in our graph G as follows. 

([p, /,s],[g,<M]) G G iff U,s] - \g,t] and 

a / f <g 

P q \ V (f = g As <t) 

V 

1 A / f>9 

p-q-lA^ v (J = j A s > i) 

Cells that would otherwise be dependent on frame -1 instead make reference to the 
initialization cell: 

(c/,[0,<M])€ G iff [J f,s] - [g,t]A{f>gy(f = g^s> t)). 

The execution schedule for the LaRC model is implicit in the frame structure: 
all the subframes for frame 0 are executed in order, then those for frame 1, and 
so until the last subframe of frame M — 1, at which point a new cycle starts over 
at subframe 0 of frame 0. If we let I({f) = E^o M g denote the total number of 
subframes in the first / frames of the task graph, then we require 

when(ci) = 0, 

when(\p, f, s]) = p x + M) + (K(f) + f) + s + 1 

and 

when(v( q , g )) = q X ( K(M ) + M ) + {K(d) + 9) + M g + 1. 

We define orderings > and > over (cycle, frame) pairs based on their position in 
the execution sequence: 

(P,f)> (9,5) if (P> 9 )V(p = 9 A/></), and 
(p, /) > (q, g) if (p > q) V (p = q A f > g). 

We also use the inverse relations < and < whenever convenient and extend the 
relations to voted cells by the convention 

V(, l3 ) < v {r?h) iff (q,g) < ( r,h ). 
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A voting cell V( ? )fl ) is a candidate voting cell for ordinary cell [p, /, s] if V P(f, s,g), 
and either g = P A g > f ox q = p + 1 a g < f ; the candidate cell that is least with 
respect to the < ordering is the primary voting cell for [p,/,s], the others are 
secondary voting cells for [p, /, s]. 

An arc ([p, /, s], ^( 9 , 5 )) is added to G when is the primary voting cell for 
[p, /, s] An arc (v( 9ifl ), ^(r.h)) is added to G when v is a secondary voting cell for 
Ip,/, s], and v (q>g) is the largest candidate voting ceU for [p,/,s] with respect to the 
> ordering such that ( q,g ) < ( r,h ). Finally, we replace arcs ([p,/,s], [q,g,t]) € G, 

by arcs («( r ,A)> [9,0,<]) where v^ ih ^ is the largest candidate voting cell for [p, /, s] 
with respect to the > ordering such that ( r,h ) < (q,g). 

We claim that the transformation just described will cast an instance of the 
LaRC model into an instance of our model in a way that preserves its essential 
properties. Despite its notational complexity, the transformation is really quite 
simple: it “unrolls” the cyclic schedule of the LaRC model into flat structure that 

we require, and it encodes the frame-based voting of the LaRC model in the voted 
cells of our model. 


4.1 Specific Voting Patterns 

In the following sections we will derive results similar to those of [9, Section 14] 
for specific voting patterns. We will use the general character of the transformation 
between the LaRC model and ours described above, but will not undertake literal 
translations of the LaRC Theorems. Instead, we will state what we consider to be 
the mam thrust of the LaRC Theorems directly in the terms of our model, and will 
conduct our proofs within that context. In this way, we avoid the tedious labor of the 
transformation, preserve the clarity of the presentation of each result, and increase 
its generality of application. We claim, but do not prove, that if the statements of 
the Theorems of [9, Section 14] are transformed in the way described above, then 
the resulting “mapped” theorems will be special cases of those given below. 

All we require to state our first two results is a notion of “frame.” The idea is 
that all cells belong to exactly one frame; the members of each frame are executed 

sequentially; the last cell executed in each frame is a voted cell, and no other cells 
are voted. 

Thus we introduce the set 

• F = {0, 1, . . ., |/|) of frames , with mapping 

• frame: C —> F, and equivalence relation 

• ~ C C X C 


a ~ c = frame(a) - frame(c). 


where 
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Thus frame(c) denotes the frame to which cell c belongs, and a ~ c indicates that 
a and c both belong to the same frame. The requirement that all the members of 
a frame are executed in sequence, with no members of other frames intervening, is 
simply stated by the requirement that the derived function 

• frame-sched : M — *• F , 
given by 

frame-sched(m) = frame(sched(m)), 
should be monotonic increasing. 

The final cell executed in a frame is the only voted cell in that frame: 

C v d = {c|Va :a~cD when(a) < when(c)}. 

It is convenient to let voted-cell{f ) denote the voted cell for frame /. 

Equipped with these definitions, we can state and prove results about increas- 
ingly less restricted frame-based voting patterns. 


4.1.1 Continuous Voting 


The idea here is that the entire state of the replicated machine is voted every frame. 
Thus, any cell that requires a value from an earlier frame need only refer to the 
voting cell of the immediately preceding frame. Hence, our formalization is: 

Definition 1 (Continuous Voting) A replicated machine performs continuous 


voting if: 


(a,c) £ G D a ~ cV a = voted-cell(frame(c ) - 1). 


We have 

Theorem 2 If a majority of machines is working throughout each consecutive pair 
of frames, then the replicated machine is safe under continuous voting. 

Proof: For any cell c, we need to ensure that a majority of component machines 
are working throughout the period from committed-to(c) to when(c). The definition 
of continuous voting ensures 

when(voted-cell(frame(c) — 1)) < committed-to{c ) 

and 

when(c) < when(voted-cell(frame{c))). 

Hence, the requirement that a majority of machines are working throughout each 
consecutive pair of frames is sufficient to ensure that the replicated machine is safe. 

□ 
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4,1.2 Cyclic Voting 

The idea here is that cells in frame / never refer to cells from frames earlier than 
/ — e, where e is a parameter to the design. Further, when cells make “out of frame” 
references, it is only to voted cells. 

Definition 2 (Cyclic Voting) A replicated machine performs cyclic voting with 
period c if: 

(a, c) 6 G D a ~ c V (a = voted-cell(frame(c) — k) A 1 < k < e). 

(Obviously, there is also a well-formedness condition: frame(c) - k > 0.) Notice 
that cyclic voting reduces to continuous voting when e = 1. 

Theorem 3 If a majority of machines are working throughout each sequence of e + 1 
consecutive frames , then the replicated machine is safe under cyclic voting. 

Proof: For any cell c, we need to ensure that a majority of component machines 
are working throughout the period from committed-to(c) to whence). The definition 
of cyclic voting ensures 

when{voted-cell{frame(c) - e)) < committed-to(c) 


and 


when(c) < when(voted-cell(frame(c))). 


Hence, the requirement that a majority of machines are working throughout each 
consecutive sequence of c + 1 of frames is sufficient to ensure that the replicated 
machine is safe. □ 


4.1.3 Optimal Voting 

In this section, we examine conditions that allow a replicated machine to vote as 
little data as possible, and as seldom as possible, yet still be able to recover from 
transient failures in a fixed amount of time. 

The general condition is very simple to state, but not very interesting: 

Lemma 1 If there exists a constant B such that 

Vc : when(voted-cell(frame(c) - B )) < committed-to(c), 

and a majority of machines are working throughout each sequence of B + 1 consec- 
utive frames, then the replicated machine is safe. 
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Proof: This result follows by the same argument used in Theorem 3. □ 

The conditions become more interesting when we consider cyclic schedules. It 
is natural and convenient to think of cyclic schedules as generated by repeatedly 
“unrolling” a more basic schedule for a single cycle. We assume such basic schedules 
to be composed of “basic cells” of the form [/, s] where / is the frame , and s the 
subframe. A relation -*• defines the dataflow relationships among the basic cells: 
s ] — ► [g,t] means that subframe s of frame g provides input to subframe t of frame 
g. Cells are executed in order by frames, and in subframe order within frames. As 
before, we assume there are M frames. 

So far, this model is the same as the LaRC model [9]; a difference is that here 
we allow arbitrary basic cells to be designated as voted cells, whereas the LaRC 
model considers voting to take place at the end of each frame and indicates that 
cell [/, s] is voted in frame n by VP(f,s,n). As explained at the beginning of this 
chapter, there is a straightforward transformation from the standard LaRC model 
to the variant used here. 

The frame length of a step [/, s] -> [g, t] is defined by 

0 if / = g A s < t, 

M if / = g A s > t, 
g - f if / < g, and 
M + {g- f) if / > 9 

A path in the basic schedule is a sequence of cells 

< [f,s],[g,t],...,[h,u] > 

such that 

LM-M- MM]. 

The frame length of a path is the sum of the frame lengths of its individual steps. 

We “unroll” the basic model to yield cells of the form [p, /, s] where p is the 
cycle, and / and s are the frame and subframe as before. The graph G comprises 
pairs of cells ([p,f,s],[q, g, <]) such that [f,s] -> [g,t] in the basic model and 

p = q if (f <g)v{f = g*»<t) 

p = q- 1 if (/ > s) v (/ = 9 As>t) 

A cell [p, /, s] is voted if [/, s] is designated asji. voted cell in the basic schedule; 

[p, /, s] is a sensor cell if it has indegree zero in G (i.e., if there is no basic cell [g,t] 

such that [g,t] — ► [/, s]). 

The frame-time of a cell is its position in the execution sequence: 
frame-time([p , f,s]) — pxM + f\ 
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the frame length of an arc ([p,/,s], [q,9,t]) in the graph G is defined to be 

frame-time([q , g , t]) - /rame-Zime([p, /, 5 ]). 

Notice that the construction ensures that this value is nonnegative, and that it equals 
the frame length of [/, s] — ► [<?,/] in the basic schedule. A path in the (unrolled) 
schedule is a sequence of cells 

such that each consecutive pair of cells are connected by an arc in the graph G . The 
frame length of this path is defined as 

frame-time{[r , h, u]) — frame-time([p , /, s ]). 

It is easy to see that this equals the sum of the frame lengths of the individual arcs, 
and that it also equals the frame length of the basic path 

< — [M] > - 

A path 

^ \Pi ft ^]? * ■ * 1 ^ 

is a commitment path if 

• The cell [p, f, s] is either a sensor cell or a voted cell, and 

• No other cells in the sequence are voted, except possibly the last. 

Then we have 


Lemma 2 If there exists a bound B on the frame-length of any commitment-path, 
and a majority of machines are working throughout each sequence of B + 1 consec- 
utive frames, then the replicated machine is safe. 


Proof: If 




is a commitment-path, then 


[p, f, -s] € supporter, h,u]). 

If no commitment-path has frame length longer than B, it follows that 
when(voted-cell(frame([r, h,u]~ B ))) < committed-to([r,h,u]) 
and the result follows by the previous lemma. □ 


The existence of the bound B is determined by the presence of vote-free cycles 
(loops) in the basic task graph: 
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Lemma 3 There exists a bound B on the frame-length of any commitment-path if 
and only if all cycles in the basic task graph contain at least one voted cell. 

Proof: Suppose there is no such bound B. Then there are commitment-paths of 
arbitrary frame lengths — and therefore of arbitrary lengths, since the frame length 
of any individual step is fixed. Since the number of basic cells is fixed and finite, it 
follows that there must exist a commitment-path of the form 

in which the components of some unvoted basic cell [/, s]_are repeated and no voted 
cells appear in between. The construction of the graph G is such that this can only 
happen if there is a cycle 

[/,«]-•••-[/.*] 

in the task graph comprising only unvoted cells. 

Suppose, on the other hand, that the basic task graph contains a cycle 

[/i s ] ~ ^ ‘ ~ [/> 5 ] 

comprising only unvoted cells. Then a commitment-path can be constructed con- 
taining a segment derived from enough iterations of this basic cycle that the frame- 
length exceeds any fixed bound B. □ 

Combining these lemmas, we obtain 

Theorem 4 Recovery from transient faults is possible if and only if there are no 
vote-free cycles in the basic task graph. Further, if all paths of the form 

LM -f [g,t] — ► [M]> 

where at most the first and last elements are voted, have path lengths no longer than 
B, and if a majority of machines are working throughout each sequence of B + 1 
consecutive frames, then the replicated machine is safe. 

Proof: Combine the preceding three lemmas. □ 


Chapter 5 

Discussion and Conclusions 


We begin with a consideration of possible extensions to this work. These extensions 
fall into four categories, listed in order of increasing complexity: 

• Proof of additional properties within the current model, 

• Modification of the current model in order to enhance its abstractness, 

• Development of more concrete models on top of the current model, and 

• Significant extensions to the model in order to encompass a wider class of 
systems. 

We consider each of these categories in turn. 

A topic where additional proofs would expose the underlying requirements more 
c ear y concerns the retention of stored values. The current model treats the system 
state as a function recording the values of all cells encountered during the entire 
etime of the system. Obviously this is not how we expect the system to be imple- 
mented. It is intuitively clear that the only cells whose values need to be retained 
are those which have been computed but not yet used— that is, the value of cell c 
needs to be retained only for the interval from when(c) to max{when(a)\(c, a ) e G). 

This can be specified by modifying the definition of the basic function step 
Currently, we have 

step(o,c) = a with [c := if c G C s then sensor(c)(when(c)) else task(c)(i 7)]. 
This definition can be replaced by two axioms specifying a modified function step'-. 
step'(o,c)(c) = if c eC s then sensor(c)(when(c)) else task(c)(<j) 

and 


Va, b:(a,b)e G A when(a) < when(c) < when(b) D step' (a, c)(a) = a (a). 
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To establish that step' is an adequate replacement for step we need to prove that 
the actuator commands are the same in both cases. 

There are two ways to carry out this proof. One would establish a variant 
specification for simple .machine using step' instead of step , and would prove that 
actuator outputs are the same in both cases-that is, it would verify a theorem 
of the form runto(c)(c) = runto’(c)(c). This approach would leave the existing 
specification and verification unchanged but would require a fairly extensive new 
verification that would mirror, in many respects, the verification already performed. 
The other approach would modify repljnachine to use step' instead of step and 
would then carry this additional complication along in the proof of fault masking. 
This approach is probably the simplest, since the definition of step is used only five 
times in proofs concerning the replicated machine. 

A topic where increased abstraction in the current model and verification would 
expose underlying requirements more clearly is the choice of voting strategy. The 
current model is firmly based on majority voting, but other strategies such as plural- 
ity voting have attractions. As long as the working machines constitute an absolute 
majority, plurality voting exhibits the same behavior as majority voting. If the 
working machines should fail to form an absolute majority, however, the majority- 
voted system will break down, whereas a plurality-voted system may break down 
or may not, depending on whether enough of the failed machines agree on a com- 
mon, wrong value to win the plurality vote. There seems to be no way to measure 
the likelihood of this latter event, nor any sound way to engineer a system so that 
failed machines are unlikely to agree, and so we do not advocate the use of plur ty 
voting as a way to enhance the claimed reliability of the system. There seems little 
harm, however, and possibly some value, in using voting strategies that are more 
robust than strict majority— so that there is at least some chance the system may 
continue to work even after an explosion, or other catastrophic event, has rendered 
10 -9 irrelevant. 1 

These considerations provide the motivation for a more careful examination of 
the voting and fault-model assumptions required for the Consensus Theorem to 
hold There are two places in the present development where the properties of 
strict majority voting are employed. One, noted in Chapter 3, is in the proof of 
torch-carried, the other is in the proof of vote-lemma in module voted_step. 
It would be very worthwhile to revisit these proofs and to determine a minima 
characterization of the properties actually required of the voting function in order 
for the fault-masking properties to be retained. (Majority is a strict requirement for 
the torch-carried property, but there seem to be other ways to conduct the part ol 
the proof in which this property is used.) The ability to conduct such investigations 
is one of the benefits of a truly formal development: the axiomatic and definitional 


'Paul Miner of NASA LaRC first drew these considerations to our attention 
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basis of the development is known precisely, and the effect of controlled variations 
can be rigorously explored. 2 

A prime candidate for a more concrete model to be constructed on top of the one 
developed here is that of Di Vito, Butler and Caldwell. As indicated in Chapter 4, 
the main results proved for that model can also be derived from ours; it would be 
interesting to formally verify those derivations. At a later stage in this program of 
work, when an actual design for a reliable computing platform for DFCS has been 
developed, it will be valuable to attempt to instantiate our model for that design. 

The characteristics of some potential system designs cannot be seen as instan- 
tiations of our model: it will be necessary to significantly revise and extend the 
model in order to accommodate such designs. Among the revisions and extensions 
that would be most illuminating are those that break the lock-step synchronization 
of task executions in the component machines. One extension would still require 
the same workload for each component machine, but would allow them to execute 
different schedules. Obviously there are constraints that require a notion of “consis- 
tency” to be satisfied among schedules — they must synchronize for votes and must 
not deadlock, for example. The practical benefit of allowing different schedules on 
different channels is that simultaneous transient failures of several channels, such as 
a lightning strike might induce, will be less likely to all affect the activations of a 
single task; instead, the damage will be shared among several different tasks, and 
all may still be executed by a majority of working processors. 

Another extension would introduce different workloads for different machines. 
This allows different quantities of replication for different activities and permits 
better utilization of resources. For example, one really critical activity may run 
on all processors, another less critical one may run on only three, while another, 
presumably unimportant, task may run on but a single machine. 

So much for future extensions; we now turn to a consideration of the significance 
of the work actually performed. The work described is just one of the first steps in 
a much larger program and it would be premature to evaluate the overall program 
at this stage. We can, however, ask what the model developed here contributes to 
a science of DFCS design, and we can ask what further value is contributed by its 
formal specification and verification. 

Clearly, our model addresses only a small fragment — redundancy management — 
of the overall problem of DFCS design, and is a highly abstracted representation of 

2 It may seem moot to explore the circumstances under which a Consensus Theorem can hold with 
less than working channels when the underlying Byzantine fault tolerant sensor distribution 

and clock synchronization algorithms require working channels. Our response is that it would 

be worthwhile to investigate the behavior of these Byzantine fault- tolerant algorithms when fewer 
than the required channels are available. It should be possible to tolerate nonByzantine failures 
with only working channels, but it is unknown whether the standard Byzantine algorithms 

do so. There has, however, been some investigation of algorithms that tolerate multiple failure 
modes [58, 59]. 
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that fragment. Small though that fragment may be, however, the evidence cited in 
Section 1.2 suggests that it is one of the most crucial problems; if managed poorly, 
redundancy can reduce, rather than enhance, the overall reliability of a DFCS. 
Recall the summary of Mackall [23, pp. 40-41] quoted on page 8, and which reads 
in part: 

“. . . qualification of such a complex system as this, to some given level 
of reliability, is difficult . . . [because] the number of test conditions be- 
comes so large that conventional testing methods would require a decade 
for completion. The fault-tolerant design can also affect overall sys- 
tem reliability by being made too complex and by adding characteristics 
which are random in nature, creating an untestable design. 

“. . .reducing complexity appears to be more of an art than a science 
and requires an experience base not yet available. If the complexity is 
required, a method to make system designs more understandable, more 
visible, is needed.” 

The purpose of the work described here (and of the larger program) is precisely to 
address these pleas for testable designs, purged of “random characteristics,” and 
which are more “understandable, more visible.” 

We contend that our model shows that certain principles of design — Byzantine 
fault tolerant distribution of sensor samples, loosely synchronized execution, ma- 
jority voting of all actuator outputs, and periodic majority voting of internal 
state data — provide predictable behavior that masks faults and provides transient- 
recovery. These principles of design are encoded in the axioms and definitions of 
our model; the conclusion is derived by mathematical reasoning from that basis. 

Other models have been devised that address similar problems. A general 
method, known rather misleadingly as the “state-machine approach” for construct- 
ing reliable systems from unreliable components that periodically vote their results 
was developed by Lamport in a series of classic papers [60-62] (see also Schneider’s 
tutorial [63]). The development here can be seen as a modification of Lamport’s 
“state-machine” approach to the case where voting is performed intermittently. 

The model most similar to our is, of course, that of Di Vito, Butler and Cald- 
well [10,9]. The formal connection between the two models was discussed in Chap- 
ter 4; here we consider less tangible issues — style, abstractness, and the influence of 
formal verification. 

A maxim usually attributed to Einstein holds that a theory should be “as simple 
as possible — but no simpler.” In our domain, simplicity is closely related to the ab- 
stractness of the model considered: the advantage of abstraction is that it reduces 
a problem to its simplest form and exposes its essential properties to scrutiny, un- 
cluttered by extraneous matter; the danger is that too much is left out, so that the 
model fails to capture those aspects of reality that are of interest. When formal 
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verification is undertaken, abstraction has economic, as well as philosophical con- 
sequences: it will generally be easier, and hence require less resources, to verify an 
abstract model than a more concrete one. Furthermore, the abstract model should 
have wider applicability, and hence the cost of its verification can be amortized over 
more instantiations. Of course, the cost of one instantiation must be borne in order 
to reach the level of detail considered in the more concrete model. 

Our model is considerably more abstract than that of Di Vito, Butler and Cald- 
well; we explained the reasons for our choices in Section 1.3.1 and considered the 
reconciliation between the two models in Chapter 4. For the purpose of formal ver- 
ification, we consider our model to have distinct advantages: it has been subjected, 
essentially without change, to formal specification and mechanical proof checking 
in Eh DM, whereas we believe that direct verification of the LaRC model would be 
a considerable challenge. Whether the added concreteness of the LaRC model ren- 
ders it a more effective specification for human review is something we leave to our 
readers to decide. 

The remaining question we consider is whether formal specification and me- 
chanical proof checking added anything of value to the quasi-formal description and 
proof presented in Chapter 2. The first thing to note is that the description and 
proof given in Chapter 2 were heavily influenced by the formal verification— both 
before and after the latter was performed. It was influenced even before the formal 
verification was attempted because the model was constructed with formal specifi- 
cation and verification (in Eh dm) in mind. Hence, it is expressed directly in terms 
of (higher-order) functions; the LaRC model, on the other hand, uses vectors, se- 
quences, sets, and iterated conjunction operators. These can all be expressed in 
terms of (higher-order) functions and we would not hesitate to use them where they 
contribute to clarity— on the other hand, we generally prefer to do without these 
constructs when a comparably simple specification can be found that is expressed 
directly in terms of functions. After the formal verification had been performed, we 
revised some of the definitions and the proof of Chapter 2 in order to bring them 
more closely into line with the corresponding Eh DM versions. 

There is one improvement derived from the formal verification that we did not 
retrofit to development of Chapter 2: this is a stronger formulation of the main 
Consensus Theorem. The Consensus Theorem is stated as 
//safe(c), then 


Vj £ R : OK(j)(c) D good-value(j, c). 


where 


safe(c) = f (Va : when(a) < when(c) D MOI((a )). 
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In the Eh DM verification, the Theorem was strengthened by giving a weaker 
(recursive) definition for safe: 

safe(c) d = MOK(c) A (Va : (a,c) G G D safe(a)). 

The stronger theorem requires only that the replicated machine is MOK for all 
those cells that transitively contribute input to ceU c; the weaker form requires it 
be MOK for all cells executed prior to c. 

Obviously, the stronger theorem could have been stated and proved in the quasi- 
formal development just as well as the weaker one. The significant point, however, is 
that it was the weaker formulation, and correspondingly a proof by simple induction, 
that arose most naturally in the quasi-formal development. In formal verification, 
the familiar convenience of simple induction is less of a driving force, and we were 
led to contemplate the stronger theorem, which requires a more difficult Noetherian 
induction. 

The main benefit that we see accruing from the mechanically checked verification 
is the precision with which the underlying assumptions are now known. Formally, 
this basis consists of 18 axioms (of which only 11 are directly concerned with the 
model, while the remaining 7 deal with supporting concepts such as cardinality), and 
15 definitions (which provide only conservative extensions in Eh dm). Informally, we 
have acquired a much better appreciation of the issues concerning the retention of 
stored values, and of the way in which fault masking is dependent on the properties 
of majority (as opposed to other kinds of) voting. As described above, we are now 
in a position to investigate these issues formally. 

In future work, we hope to explore these issues, and also to extend our formal 
specification and verification toward the behavior of a realistic operating system 
that will implement the fault-masking techniques modeled here. The next step will 
be to combine the model used here with that for clock synchronization [38], in order 
to consider the more realistic case of replicated computers that are synchronized 
only within some bound 6, and in which computation and communication take a 
certain amount of time. 
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Appendix A 

BTjjX-printed Specification 
Listings 


The following specification listings were formatted and converted to mathematical 
notation automatically using the Ehdm MgX-printer. 

simple_machine: Module 
Exporting all 
Theory 
n: Var nat 
M: Type is nat 
m: Var M 
C, D: Type 
a, c: Var C 

cell-types: Type = (sensor.cell, actuator jcell, task_cell) 
cell-type: function [C — ► cell-types] 

Cs • Type from C with (Ac: cell_type(c) = sensor.cell) 

Ca • Type from C with (Ac: cell_type(c) = actuator_cell) 

Ct • Type from C with ( Ac : cell.type(c) / sensor-cell) 
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Appendix A. IATj?X-printed Specification Listings 


start_cell: Cs 
arb_task: Ct 
arb_actuator: Ca 

(*1,*2) € G: function [C, C — ♦ bool] 
sensor_ax: Axiom ( 3 a : (a, c) £ G) O ~>(c in Cs) 
sched: function [M — > C] 
when: function [C — ► M] 

Gbar_when: Axiom (a, c) G G D when(a) < when(c) 
sched .when _ax: Axiom (sched(m) = a) (m = when(a)) 
dowhen_pos: Axiom when(c) > 0 
p, q: Var M 

unique jwhen: Lemma p ^ q 3 sched(p) ^ sched(<y) 
previous: functionfC — ► C] —= (Ac: sched(pred(when(c)))) 
sched_when Jemma: Lemma a = sched(when(a)) 
when_schedJemma: Lemma m = when(sched(m)) 
dowhen.previous: Lemma when(previous(c)) = pred(when(c)) 
state: Type is functionfC — ♦ D ] 
initial-state: state 
5, t: Var state 

sensor_fn: Type is functionfM — > D ] 
sensor: functionfCs — ► sensorJn] 
task Jh: Type is functionfstate — ► D\ 
task: functionfCj — > ► task_fn] 
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dependency: Axiom 

c in Ct A ( V a : (a, c) £ G D s(a) = t(a )) 

D task(c)(s) = task(c)(t) 

step: function [state, C, M -»• state] = 

( A s, c, m : s 

with [(c) := 

if c in Cs then sensor(c)(m) else task(c)(s) end if]) 

identity: functionfM -» nat] == ( A m : m) 

run: Recursive function[Af — ► state] = 

(Am: 

if m = 0 then initial.state else step(run(m - l),sched(m), m) end if) 
by identity 

runto: function [C -> state] == (Ac: run(when(c))) 

Proof 

sched.when.proof: Prove sched_when_lemma from 
sched_when_ax {m <— when(a)} 

when_sched.proof: Prove when_schedJemma from 
sched_when_ax {a <- sched(ra)} 

dowhen.prev .proof: Prove dowhen_previous from 
when_schedJemma {m <- pred(when(c))} 

unique_when_proof: Prove unique_when from 
when_sched Jemma {m <- p}, when_sched Jemma {m <- 9 } 

End simple_machine 
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simple_machine_tcc: Module 
Using simple_machine 
Exporting all withsimple_machine 
Theory 

m: Var naturalnumber 
a: Var C 
c: Var C 

s: Var function [C — *■ D } 
t: Var function[C — *• D] 

sensors.TCCl: Formula (3c: cell_type(c) = sensor .cell) 

actuators_TCCl: Formula (3c: cell-type(c) = actuator_ceU) 

active_tasks_TCCl: Formula (3c: cell_type(c) / sensor.cell) 

dependency _TCC1: Formula 

(c in Ct A ( V a : (a,c) £ G D s(a) = t(a))) 

D (cell-type(c) ^ sensor.cell) 

step.TCCl: Formula (c in Cs) 3 (cell-type(c) = sensor.cell) 
step_TCC2: Formula He in Cs)) D (cell.type(c) # sensor.ceU) 
run.TCCl: Formula ( ->(m = 0)) D {m - 1 > 0) 
run_TCC2: Formula (-<(m = 0)) D identity(m) > identity(m - 1) 

Proof 

sensors.TCCl -PROOF: Prove sensors_TCCl 
actuators.TCCl-PROOF : Prove actuators_TCCl 
active.tasks.TCCl-PROOF: Prove active.tasks_TCCl 
dependency-TCCl -PROOF: Prove dependency _TCC1 
step.TCC 1-PROOF: Prove step_TC Cl 


sim p/e_m a,chiae.t cc 


step _TCC2_P ROOF: Prove step.TCC2 
run_TCCl_PROOF: Prove run _TC Cl 


run_TCC2_PROOF: Prove run_TCC2 
End simple_machine_tcc 



04 Appendix A. I^TgX -printed Specification Listings 

simple_machine-.tcc_proofs: Module 

Proof 

Using simple_machine_tcc 

sensors _TC C 1 _P RO 0 F : Prove sensors.TCCl {c *- startxeU} 

active_tasks_TCCl_PROOF: Prove active.tasks.TCCl {c <- arb.task} 
from distinct .cell-types 

actuators.TCC 1-PROOF: Prove actuators.TCCl {c <- arb-actuator} 


End simple_machine_tcc_proofs 
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noetherian: Module [dom: Type, <: function[dom, dom -► bool]] 
Assuming 

measure: Var function[dom — ► nat] 

a, b : Var dom 

well-founded: Formula 

( 3 measure : a < b D measure(a) < measure(fr)) 

Theory 

p, A, B: Var function[dom — ► bool] 
d,d\,d 2 : Var dom 
general-induction: Axiom 

( V d\ : {V d 2 : d 2 < dx D p(d 2 )) D p(di)) D (V d : p(d)) 

d 3 ,d 4 : Var dom 

mod induction: Theorem 

( V c? 3 , d 4 : d 4 < d 3 D A(d 3 ) D A(d 4 )) 

A (Vdj : (Vd 2 :d 2 < dj d (A(d\) A B(d 2 ))) D B(dx)) 

D (V d : A(d) D B(d)) 

Proof 

mod_proof: Prove 

modJnduction {di di@pl, d 3 <- dj@pl, d 4 <- d 2 } 
from general induction {p <- ( X d : .4(d) D B(d))} 

End noetherian 
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natinduction: Module 
Theory 

Var nat 

p: Var function [nat — ► bool] 

induction: Theorem (p(0) A ( Vi : p(i ) D p(i + 1))) ^ P{ n ) 
induction_m: Theorem 

p(m) A ( V i : i > m A p(i) D p(i + 1)) D ( V n : n > m 3 p( n )) 
limited induction: Theorem 

(m < mi D p(m)) A ( V i : * > m A * < mi A p(i) D p(* + 1)) 
3 (Vn:n>mAn<mi D p(rc)) 


Proof 

Using noetherian 

prev: function[nat, nat -*• bool] == (\m,n : m + 1 = n) 
instance: Module is noetherian[nat, prev] 
x : Var nat 

identity: function [nat — ► nat] == ( A n : n) 

discharge: Prove well-founded {measure <- identity} 

ind-proof: Prove induction {i « pred(di@pl)} from 
general-induction {d <— n, ^2 0 

ind_m -proof: Prove induction_m {i <— i@pl + ru} from 
induction 

{p <- ( Xx : p@c(x + m)), 

n «- if n > m then n - m else 0 end if} 

limited .proof: Prove limited-induction {i «— i@pl} from 
induction. m {p^ (Ax :x < n»i Dp»c(i))} 


End natinduction 


natinduction.tcc 


natinductionjtcc: Module 
Using natinduction 
Exporting all withnatinduction 
Theory 

m: Var naturalnumber 
n: Var naturalnumber 
ind_m_proof_TCCl: Formula 

(m > 0) A (n > 0) D ( if n > m then n — m else 0 end if > 0) 

Proof 

ind_m_proof_TCC 1-PROOF: Prove ind_m-proof_TCCl 


End natinduction.tcc 
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simple.props: Module 
Using simple_machine,natinduction 
Exporting wit hsimple .machine 
Theory 
o, c : Var C 

stay .correct .simple: Lemma 

(a,c) gGD runto(previous(c))(a) = runto(a)(a) 

simplesensorstepJemma: Lemma 

c in Cs D runto(c)(c) = sensor(c)(when(c)) 

simplestepJemma: Lemma 

-»(c in Cs) D runto(c)(c) = task(c)(run(pred(when(c)))) 

Proof 

m: Var M 

indstep: Lemma run(ra)(a) = runto(a)(a) D run(m + 1 )(a) = runto(a)(a) 

indstep.proof: Prove indstep from 
run {m <— m + 1 }, 

step {s run(ra), c sched(m + 1 ), m <— m + 1 }, 
unique.when {p when(a), q <— m + 1 }, 
sched.when Jemma 

q : Var M 

stay ^imple.proof; Prove stay .correct simple from 
inductionsn 

{p «- ( A <7 ; run(g)(a) = runto(a)(a)), 
m <— when(a), 
n <— when(previous(c))}, 
indstep {m 

sched.when Jemma {a <— previous(c)}, 

Gbar.when, 

whenschedJemma {m <— pred(when(c))} 



simple.props 


simple_sensor_step_proof: Prove simple_sensor_step Jemma from 
run {m <— when(c)}, 

step {s <— run(pred(when(c))), m when(c), c c}, 

schecLwhen Jemma {a <— c}, 

dowhen.pos 

simple.step Jemma.proof: Prove simple.step Jemma from 
run {m <— when(c)}, 

step {m <— when(c), s <— run(pred(when(c)))}, 

schedjwhenJemma {a <— c}, 

dowhen_pos 


End simple.props 
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simple.props.tcc: Module 
Using simple.props 
Exporting all withsimple.props 
Theory 

c: Var simple .machine. C 

i: Var naturalnumber 

simple.sensor_stepJemma.TCCl: Formula 
(c in Cs ) D (cell_type(c) = sensor_cell) 

simple_step_lemma_TCCl: Formula 

(-<(c in Cs)) D (cell-type(c) / sensor .cell) 

Proof 

simple.sensorjstepJemma.TCCl _P ROOF : Prove 
simple_sensor_step Jemma.TCCl 

simple_stepJemma_TCCl_PROOF: Prove simple .step Jemma.TCCl 
End simple.props.tcc 
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sets: Module [T: Type] 

Exporting all 
Theory 

set: Type is function[T — > bool] 
x, y,z: Var T 
a, b: Var set 

*1 U *2: function[set,set — ► set] == 

( A a, b : ( A x : a(x ) V 6(:r))) 

*1 fl*2: function [set, set -+ set] == 

( A a, b : ( A x : a(x) A &(a;))) 

★1 \ *2: function [set, set — ► set] == 

( A a, b : ( A x : a(x ) A -’6(i))) 

add: function[T, set set] == (\ x,a : (\y : x = yV a(y))) 

{* 1 }: function [T set] == ( A x : ( A y : y = a:)) 

★1 C * 2 : function [set, set — ► bool] = 

(A a, 6 : (Vz : a{z) D 6(2:))) 

*1 € *2: function [T, set bool] == (Xx,b: 6(1)) 

empty: function[set -*• bool] = ( A a : ( V x : ~ia(x))) 

0 : set == ( A x : false) 

fullset : set == ( A x : true) 

extensionality: Axiom ( V x : x 6 a = x 6 b) j (a = b) 

End sets 
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cardinality: Module [T: Type] 

Using sets[T] 

Exporting all 
Assuming 
x,y, z: Var T 
N : Var nat 

/: Var function [T — ► nat] 
finite: Formula 

(3 NJ: (Vx,y : /(as) < N A (f(x) = f{y ) D x = y))) 
Theory 

a, 6, c: Var set 
| ★ 1|: function [set — + nat] 
card_ax: Axiom |a U 6| + |a D &| = |a] + H 
card-subset: Axiom a C b D |a| < |6| 
card_empty: Axiom |a| = 0 -O- empty(a) 
empty-prop: Lemma |o| > 0 D ( 3 x : x G a) 
card .prop: Lemma 

a C c A b C c A 2 * |a| > |c| A 2 * |6| > |c| D |a D b\ > 0 

Proof 

empty _prop_proof: Prove empty_prop {x *- x@p2} from 
card_empty, empty 

subset-union: Sublemma aCcAbCcDaUbCc 

subset .union .proof: Prove subset-union from 
*1 C *2 {z *- z@p3, b <- c), 

*1 C *2 {z <- z@p3, a «- b, b*-c}, 
*lC*2{fl*-aUM*- c } 

m, n,p: Var nat 
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twice.prop: Sublemma 2*m>pA2*n>pDm + n>p 

twice.proof: Prove twice.prop 

card .proof: Prove card.prop from 

twice.prop {m <- |o|, n *- |6|, p «- |c|}, 
card .ax, 
subset .union, 

card-subset {a <— a U b, b <— c} 

End cardinality 
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orderedsets: Module [T: Type, <: function[T, T -► bool]] 
Using sets[T] 

Exporting min withsets[T] 

Assuming 

x ,y,z: Var T 
reflexive: Formula x < x 
transitive: Formula x<yAy<zDx<z 
antisymmetry: Formula x<yAy<xDx = y 
dichotomy: Formula x < y V y < x 
Theory 
a: Var set 

min: function [set — ► T] 

min_ax: Axiom min(a) 6«A(Vi:i£aD min(a) < x) 


End orderedsets 
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repl_machine: Module 
Using simple_machine, sets, cardinality 
Exporting all withsimple_machine 
Theory 
n: Var nat 
m: Var M 
c : Var C 

voted: Type from C 
voted^ax: Axiom 

(c in Ca D c in voted) A (c in voted D -> (c in C s )) 
r: nat 

R: Type from nat with (A n : n < r) 
i : Var R 

F: function [ii — + function [M bool]] 

rstate: Type is function[i£ — ► state] 
a, r: Var rstate 
maj; function [rstate, C — ► D] 

A: Var set[J2] 
x : Var D 
maj_ax: Axiom 

( 3 A : 2 * \A\ > |fullset[i2]| A ( V % ; i e A D = x)) 

D maj(a, c) — x 


vote: function[rstate, C, M — ► rstate] 
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vote-ax: Axiom 

-W)(m)) 

D vot e(cr, c, m) 

= if c in voted 

then <7 

with [(z)(c) := maj(cr,c)] 
else a 
end if 

sstep: function[rstate, C, M — ► rstate] 

sstep_ax: Axiom -»(P(i)(m)) D sstep(cr, c, m)(i) = step(<r(z),c,ra) 

rstep: function [rstate, C, M — ► rstate] == 

( A a, c, m : vote(sstep(er, c, m), c, m)) 

rrun: Recursive function[M — * rstate] = 

(Am: 

if m = 0 

then ( A i : initiaLstate) 
else rstep(rrun(m — 1), sched(m), m) 
end if) 
by identity 

rrunto: function[C — ► rstate] == ( Ac : rrun(when(c))) 

Proof 

dishargeJinite: Prove 

finite[R] {/ «- ( A i nat : i), N <- r} from 
Rinvariant {R_var <— z@c} 


End repl_machine 


repl^wachine.tcc 
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repl_machine_tcc: Module 
Using repl_machine 
Exporting all withrepl_machine 
Theory 

n: Var naturalnumber 
m: Var naturalnumber 
x : Var R 

R-TCC1: Formula (3 n : n < r) 

rrun.TCCl: Formula (-1 (m = 0)) D (m - 1 > 0) 

rrun_TCC2: Formula ( _ »(m = 0)) D identity(m) > identity(m — 1) 

Proof 

R_TCC1 -PROOF: Prove R.TCC1 
rrun_TCCl_PR00F: Prove rrun-TCCl 
rrun_TCC2_PROOF: Prove rrun_TCC2 


End repl_machine_tcc 
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repl_machine_tcc_proofs: Module 
Proof 

Using repl_machine_tcc 
R.TCC1 .PROOF: Prove R.TCC1 {n «- r} 
End repl_machine_tcc_proofs 


supports 
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supports: Module 

Using repl_machine, orderedsetsfM, naturalnumbers. <],setsfC] 

Exporting support, committed.to 
withrepl_machine, orderedsetsfM, naturalnumbers. <], setsfC] 

Theory 

a,b,c: Var C 

foundation: Recursive functionfC -*• setfCll = 

(Ac: 

(Ac : 

c — a 

V (-i(c in voted V c in Cs) 

A ( 3 b : (6, c) € G A a 6 foundation(6))))) 

by when 

backup: functionfC -+ set[C]] = 

(Ac:(Aa:(36:(6, c) E G A a E foundation(h)))) 

support: functionfC — ► setfC]] = 

( A c : ( A a : a € foundation(c) V (c in voted A a 6 backup(c)))) 

Gbar .support: Lemma (a,c) £G D a Q support(c) 

in_own_support: Lemma c € support(c) 

subset-support: Lemma 

->(a in voted) A (a, c) 6 G D support(a) C support(c) 

S,T: Var setfC] 
i: Var R 
t,m : Var M 

criticaLtimes: functionfC -» setfM]] == 

( Ac : ( At : sched(t) € support(c))) 

committed.to: functionfC — ♦ M) == (Ac: min(critical.times(c))) 

commit.when Jemma: Lemma committed.to(c) < when(c) 
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commit .support Jemma: Lemma 

a € support(c) D committed.to(c) < when(a) 

commit _Gbar Jemma: Lemma 

(a,c) gGA -i(a in voted) D committed.to(c) < committed-to(a) 

Proof 

discharge_reflexive: Prove reflexive 
discharge.transitive: Prove transitive 
discharge-antisymmetry: Prove antisymmetry 
discharge-dichotomy: Prove dichotomy 

support-backup: Sublemma a G support(c) = (c = a V a € backup(c)) 

support .backup .proof: Prove support -backup from 
support, 

backup {b <— 6@p3}, 
foundation {b <— fr@p2}, 
sensor.ax {a <— b@P2 } 

Gbar-support.prf: Prove Gbar -support from 

support-backup, backup {6 <- a}, foundation {c «- a} 

in_ownj3upport .proof: Prove in.own.support from 
support .backup {a <— c} 

found-support: Sublemma ->(c in voted) D foundation(c) = support (c) 

found-support-proof: Prove found-support from 
support {a <— x©p2} y 

extensionality[C] {a <- foundation(c), b <- support(c)} 

found-sub-support: Sublemma (b,c) € G D foundation(ft) C support(c) 

found jsub_support .proof: Prove found-Sub_support from 
★1 C *2 [C] {a <— foundation(ft), b <— support(c)}, 
support-backup {a *— 2@pl), 
backup {b <— b@C , a <— z@Pl} 

subset .support -proof: Prove subset-support from 
found jsub .support {6 <- a}, found_support {c <- a} 


supports 
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committed Jemma: Sublemma 
committed Jo(c) € criticaLtimes(c) 

A ( V< :t £ criticaLtimes(c) D t > committed_to(c)) 

committed-proof: Prove committed Jemma from 
min_ax {a critical.times(c), x <— t] 

commit .when .proof: Prove commit. when Jemma from 
in_own_support, 

committedJemma {t <— when(c)}, 
sched.when Jemma {a <— c} 

commit_support .proof: Prove commit support Jemma from 
committedJemma {t *— when(a)}, sched.whenJemma 

commit _G bar Jemma.proof: Prove commit.GbarJemma from 
subset-support, 

*1 C *2 [C] 

{a ♦- support(a), 
b «— support(c), 

2 «— sched(committed_to(a))}, 
committedJemma {< *— committed.to(o)}, 
committedJemma {c «— a} 

End supports 
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supports.tcc: Module 
Using supports 
Exporting all withsupports 
Theory 

a: Var simple_machine.C 

c: Var simple .machine. C 

z: Var simple jnachine.C 

x: Var simple_machine.C 

b: Var simple .machine. C 

foundation.TCCl: Formula 

((6,c) eG) A (~i(c in voted V c in Cs )) A (->(c = a)) 
D when(c) > when(6) 

Proof 

foundation.TCCl -PROOF: Prove foundation.TCCl 
End supports.tcc 


s upports.tcc. proofs 
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supports.tcc.proofs: Module 
Proof 

Using supports_tcc 

foundation_TCCl_PROOF: Prove foundation.TCCl from 
Gbar.when {o b } 

End supports_tcc_proofs 
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correctness: Module 

Using supports, sets[i2], cardinality^] 

Exporting all with supports, setsfiZ] 

Theory 
i,j: Var R 
a, c: Var C 
77i : Var M 

OK: function]# —* set[C]] = 

( A z : 

(Ac: 

( V m : committed-to(c) < m A m < when(c) 3 -iF(i)(m)))) 

working: function[C — ► set[ J?]] == ( A c : ( A i : OK(z)(c))) 

MOK: function \C —* bool] = ( A c : 2 * |working(c)| > |fullset[#]|) 

safe: Recursive function[C — * bool] = 

(Ac: MOK(c) A ( V a : (a, c) € G 3 safe(a))) by when 

correct: function [C — * bool] = 

( A c : ( V; : OK(j)(c) 3 rrunto(c)(j)(c) = runto(c)(c))) 

the_result: Theorem safe(c) 3 correct(c) 


End correctness 



correctness-tcc 


correctness.tcc: Module 
Using correctness 
Exporting all withcorrectness 
Theory 

a: Var simple_macliine.C T 
c: Var simple_machine.C 

safe_TCCl: Formula ((a,c) G G) A (MOK(c)) D when(c) > when(a) 

Proof 

safeJTCCl -PROOF: Prove safe-TCCl 
End correctness_tcc 
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correctness_tcc_proofs: Module 
Proof 

Using correctness_tcc 

safe.TCClJPROOF: Prove safe.TCCl from G bar .when 
End correctness _tcc .proofs 


connect 
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connect: Module 

Using correctness, natinduction, simple.props 
Exporting all 
Theory 
a, c: Var C 
j: Var R 


a_correct _at_c: function[C, C — ► bool] = 

( Aa,c : 

(Vj: 

)( c ) D rrunto(previous(c))(j)(a) = runto(previous(c))(a))) 

stay .correct: Lemma 
( V a : (a, c) G G D safe(c) A correct(a)) 

D ( V a : (a, c) € C D a.correct.at_c(a, c)) 

Proof 


i: Var R 


m: Var M 


rJndstep: Lemma 
OK 00(c) 

A (a, c) G (J 
A when(a) < m 

A to < when(c) A rrun(m)(j')(a) = rrunto(a)(j')(a) 
D rrun(m + l)(j)(a) = rrunto(a)(j)(a) 
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r Jndstep.proof: Prove r Jndstep from 
rrun {m <— m + 1}, 
votejLX 

{a <— sstep(rrun(m), sched(m -f 1), m + 1), 
c <— sched(m + 1), 
m <— m + 1, 

^ 4 — J } » 
sstep^ax 

{< 7 <— rrun(m), 
c «— sched(m + 1), 
to <— m + 1, 

* i}» 

step {s <— rrun(ra)(j), c <— sched(m + 1), m 
unique_when {p «— when(a), q m + 1}, 
sched-when Jemma, 

OK {i <- jy rn *- m+ 1}, 

commit jsupport Jemma, 

Gbar -support 


m + 1 }, 


q : Var M 


stay_correct-repl: Lemma 

(a, c) € G A OK(j)(c) D rrunto(previous(c))(j)(a) - rrunto(a)(j)(o) 

stay_correct_repl_proof: Prove stay .correct jepl from 
limitedJnduction 

{p f- ( A 9 : rrun(g)(j)( a ) = rrunto(a)(j)(a)), 
m <— when(a), 
mi <— when(c), 
n «- when(previous(c))}, 
r Jndstep {m «— i@pl}, 
sched-when Jemma {a <— previous(c)}, 

Gbar-when, 

when-sched Jemma {m <— pred(when(c))}, 
dowhen.pos 


Gbar.OK: Lemma (a,c) 6 G A -(a in voted) D (OK(i)(c) D OK(i)(a)) 


connect 
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Gbar.OK.proof: Prove Gbar.OK from 
★1 C *2 [C] {a <— support(a), b <— support(c)}, 

OK {m <— m@P3}, 

OK {c ♦— a}, 

Gbar.when, 
commit.Gbar Jemma, 
subset .support 

notvoted_transfer_correct: Lemma 

(a,c) £ G A safe(c) A -i(a in voted) A correct(a) 

D OK(j)(c) D rrunto(a)(j)(a) = runto(a)(a) 

notvoted.proof: Prove notvoted-transfer.correct from 
Gbar.OK {i <— j}, correct {c <— a} 

torch .carried: Lemma 

(a,c) € G A safe(c) D (3 j : OK(j)(a) A OK(j)(c)) 

torch.proof: Prove torch_carried {j <— x@p2} from 
card .prop [R] 

{a ■*— working(c), 

6 <— working(a), 
c <— fullsetfi?]}, 

empty _prop[R] {a <- working(c) 0 working(a)}, 
safe, 

safe {c <— a), 

MOK, 

MOK {c<- a}, 

★1 C *2 [R] {a <— working(c), b *— fullsetfR]}, 

★1 C *2 [R] {a <- working(a), b <— fullset[R]} 

o: Var rstate 

vote_appln: Lemma 

-i(P(i)(when(a))) A a in voted 

D vote(a,a, when(a))(i)(a) = maj(<r,a) 

vote_appln_proof: Prove vote_appln from 
vote_ax {c <— a, m when(a)} 

safe_at_a: Lemma OK(i)(c) A (a,c) £ G D -«(i ? (i)(when(a))) 

safe.at^a.proof: Prove safe_at_a from 

OK {m when(a)}, Gbar.when, Gbar_support, commitjsupport Jemma 
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OK.OK: Lemma _ 

safe(c) A OK(i)(c) A OK (j)(c) A (a, c) e G A a in voted 
D rrunto(a)(i)(a) = rrunto(a)(j)(a) 

OK.OK -proof: Prove OK.OK from 
rrun {m *— when(a)}, 
sched.when Jemma, 
natJnvariant {nat.var <— when(a)}, 

vote^appln {o *- sstep(rrun(pred(when(a))),a,when(a))}, 

safe-at.a, 

vote^appin 

{i j, o +- sstep(rrun(pred(when(a))), a, when(a))} , 
safe^at^i {t <— j} 

voted.transfer .correct: Lemma 

(a, c) £G A safe(c) A o in voted A correct(a) 

D OK(j)(c) D rrunto(o)(j)(a) = runto(a)(a) 

voted.proof: Prove voted .transfer .correct from 

OK.OK {i +- j@p2}, 

torch .carried, 

correct {c *— a, j <— j@p2 } 

unvoted.transfer_correct: Lemma 
(a, c) G G A safe(c) A correct(a) 

D OK(j)(c) D rrunto(a)(j)(a) = runto(a)(a) 

unvoted .proof: Prove unvoted.transfer .correct from 
voted.transfer .correct , notvoted.transfer .correct 

stay .correct-proof: Prove stay .correct from 
stay .correct .simple , 
stay .correct _repl {j <— y@p3}, 
a_correct_a.t_c, 

when jched Jemma {m *— pred(when(c))}, 
unvoted.transfer.correct {j <— j@p3} 


End connect 


sensorjstep 
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sensorjstep: Module 


Using correctness, simple_props 


Exporting withcorrectness, simple^props 

Theory 
a, c: Var C 

sensorJnductive_step: Lemma 
c in Cs A ( V a : (a, c) £ G D safe(c) A correct(a)) 3 correct(c) 

Proof 
j: Var R 

sensor_stepJemma: Lemma 
when(c) > 0 A -i [c in voted) 

D OK (j)(c) 

D rrunto(c)(j>) = step(rrun(pred(when(c)))(jf),c, when(c)) 

sensor_step_proof: Prove sensorjstep Jemma from 
rrun {m <— when(c)}, 
vote_ax 
{i 3, 

m <— when(c), 

a <- sstep(rrun(pred(when(c))),c,when(c))}, 
sstep_ax 

{* ♦“ 3, 

a «- rrun(pred(when(c))), 
m <— when(c)}, 
sched_whenJemma {a <— c}, 

OK {* «- j, m <— when(c)}, 
commit.when Jemma 

sensor_rrunto Jemma: Lemma 
when(c) > 0 A c in Cs 

D OK(j)(c) D rrunto(c)(j)(c) = sensor(c)(when(c)) 
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sensor _rrunto_p roof: Prove sensor _rrunto_leninia from 
sensor .step Jemma, 
step 

{s «— rrun(pred(when(c)))(j), 
m <— when(c), 
c «- c}, 
voted _ax 

main .sensor Jemma: Lemma 

when(c) >0AcinC s D OK (j)(c) D rrunto(c)(j)(c) = runto(c)(c) 

main-sensor_proof: Prove main_sensor Jemma from 
simple .sensor .step Jemma, sensor _rrunto Jemma 

sensor Jnd_step _proof: Prove sensorJnductive_step from 

dowhen.pos, main_sensor Jemma {j «- j@p3}, correct, sensor _ax 


End sensor-step 



sensorstep-tcc 
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sensorjstep_tcc: Module 
Using sensor_step 
Exporting all withsensor_step 
Theory 

c: Var simple_machine.C 

j : Var repl_machine.ii 

sensor_rruntoJemma_TCCl: Formula 
(OK (j)(c)) A (when(c) > 0 A c in Cs) 

D (celLtype(c) = sensor.cell) 

Proof 

sensor _rruntoJemma.TCC 1 .PROOF: Prove sensor_rruntoJemma_TCCl 
End sensor_step_tcc 
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nonvoted_step: Module 
Using correctness, connect 
Exporting withcorrectness, connect 
Theory 
a, c: Var C 
j: Var R 

nonvoted_inductive_step: Lemma 

-i(c in Cs ) _ 

A ->(c in voted) A ( V a : (a, c) € G D safe(c) A correct(c)) 

D correct(c) 

nonvoted_task_OK: Lemma 

-,(c in Cs) A ( V a : (a, c) € <5 D a_correct J.t_c(a, c)) 

3 OK(j)(c) 

D ta5k(c)(rrunto(previous(c))(j)) = task(c)(runto(previous(c))) 

all_correct_at_c: function[C — ► bool] = 

( A c : ( V a : (a, c) € (5 D a_correct_at.c(a, c))) 

Proof 


nonvoted.t ask _0K .proof: Prove nonvoted_task_OK {a 4 a@p2} from 

a_correct_at-c {a <— a@p2}, 
dependency 

{s <— rrun(pred(when(c)))(jf), 
t *- run(pred(when(c)))}, 
dowhen.previous 


nonvoted_rrunto_task: Lemma 

-i(c in Cs) _ 

A ->(c in voted) A ( V a : (a, c) G G D a_correct_at_c(a, c)) 

D OK(j)(c) D rrunto(c)(ji)(c) = task(c)(rrun(pred(when(c)))(j)) 


nonvotedjstep 
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non voted_rrunto .task. proof: Prove non voted jrrunto.task from 
rrun {m <— when(c)}, 
vote_ax 

{a <- sstep(rrun(pred(when(c))), c, when(c)), 
m *— when(c), 

* - J}, 

sstep-ax 

o <- rrun(pred(when(c))), 
m «— when(c)}, 

step {m <— when(c), s <— rrun(pred(when(c)))(j')}, 
sched.when Jemma {a ♦— c}, 

OK {i «- j, m when(c)}, 
commit_when Jemma, 
dowhen.pos 

link: Lemma 

-‘(c in C s ) A ->(c in voted) A (Va:(a,c)eG D a_correct.at_c(a, c)) 
D OK (j)(c) D rrunto(c)(y)(c) = runto(c)(c) 

link-proof: Prove link {a <- a@p6) from 
nonvoted_rrunto_task, 
simple_step Jemma, 
nonvoted_task_OK, 
dowhen .previous , 
all_correct_at_c {a <— a@p3}, 
all.correct-at_c { a <— a@pl} 

main .non .voted Jemma: Lemma 
->(c in C s ) 

A --(c in voted) A ( V a : (a, c) e D safe(c) A correct(a)) 

D OK(i)(c) D rrunto(c)(j)(c) = runto(c)(c) 

main_nonvoted_proof: Prove main_non_votedJemma {a <— a@p2} from 
link, stay .correct {a <— a@pl} 

nonvotedJnd.proof: Prove non voted Jnductive.step {a <- a@pl} from 
main mon.voted Jemma {j *— j@p2), correct 

End nonvoted_step 
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nonvoted_step_tcc: Module 


Using nonvoted_step 
Exporting all withnonvotedjstep 
Theory 

c: Var simple_machine.C 

j : Var repLmachine.i? 

a: Var simple .machine. C 

non voted _t ask _0 K _T C C 1 : Formula 
(OK (j)(c)) _ 

A (~ >(c in C s ) A (Va : (a,c) G G D a_correct_at_c(a, c))) 
D (cell.type(c) ^ sensor.cell) 

nonvoted_rrunto_task_TCCl: Formula 

(OK(j)(c)) 

A (->(0 in Gs) 

A - >(c in voted) 

A ( V a : (a, c) G G D a_correct_at_c(a, c))) 
j (cell.type(c) ^ sensor_cell) 


Proof 

nonvoted_task.0K.TCCl-PR00F: Prove nonvoted_task.OK.TCCl 
nonvoted_rrunto_task-TCCl-PROOF: Prove nonvotedjninto_task_TCCl 


End nonvotedjstep_tcc 


votedstep 


voted_step: Module 
Using correctness, connect, nonvoted-step 
Exporting induction-body withcorrectness, connect 
Theory 
a, c: Var C 


voted Jnductive.step: Lemma 
c in voted A ( V a : (a, c) g G D safe(c) A correct(a)) 
D correct(c) 

induction-body: function [C — ► bool] = 

( A c : ( V a : (a, c) € G D safe(c) A correct(o))) 

Proof 

i,j: Var R 
o\ Var rstate 
m: Var M 

voted jstep Jemma: Lemma 
c in voted 
D OK(j)(c) 

D sstep(rrun(pred(when(c))), c, when(c))(j)(c) 
= task(c)(rrun(pred(when(c)))(_ 7 ')) 


voted-step.proof: Prove voted ^tepJemma from 
sstep_ax 

{* «- j, 

a •*— rrun(pred(when(c))), 
to <— when(c)}, 

step {to <— when(c), s <— rrun(pred(when(c)))(y')}, 
OK {i 4- j, m when(c)}, 
commit _when Jemma, 
voted_ax 
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sstep.taskJemma: Lemma _ 

c in voted A ( V a : (a, c) G G D a_correct-at_c(a, c)) 
D OK(j)(c) 

D sstep(rrun(pred(when(c))), c, when(c))(j)(c) 
= task(c)(run(pred(when(c)))) 


sstep_task_proof: Prove sstep.taskJemma {a - a@p2} from 

voted_step Jemma, nonvoted.task.OK, dowhen.previous, voted^x 

x : Var D 

majJemma: Lemma 

MOK(c) A (Vi: OK(i)(c) D a(i)(c) = x) D maj(<7, c) = x 

maj-proof: Prove majJemma {* <— i@pl} from 
majjix {A *- working(c)}, MOK 

voteJemma: Lemma 
OK (j)(c) 

A MOK(c) 

A c in voted 

A committed.to(c) < m 
A m < when(c) 

A (Vi: OK(i)(c) D sstep(cr, c, m)(i)(c) = x) 

D rstep(c, c, m)(j)(c) = x 

voteJemma.proof: Prove voteJemma {i <— t@p2} from 
vote_ax {t <— j, o <— sstep(cr, c, m)}, 
majJemma {<7 <— sstep(<r, c,m)}, 

OK {i - j } 


rstep.task: Lemma 
MOK(c) 

A c in voted 

A OK(j)(c) A(Va:(fl,c)eGD a.correct _at_c(a, c)) 

3 rstep(rrun(pred(when(c))),c,when(c))(j)(c) 

= task(c)(run(pred(when(c)))) 

active.task: function[C — ♦ Ct\ == 

(\ c Ct '■ if c in Cs then arb.task else c end if) 


votedstep 
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rstep.task-proof: Prove rstep.task {a <- a@pl} from 
sstep .task Jemma {j <— i@p2}, 
voteJemma 

{a: <— task(active_task(c))(run(pred(when(c)))), 
a *- rrun(pred(when(c))), 
m +- when(c)}, 
commit. when Jemma, 
voted^ax 


rrunto.task: Lemma 
MOK(c) 

A c in voted 

A OK(j')(c) A ( V a : (a, c) € G D a.correct _at_c(a, c)) 
D rrunto(c)(i)(c) = task(c)(run(pred(when(c)))) 


rrunto.task.proof: Prove rrunto.task {a <— a@pl} from 
rstep.task, 

rrun {m <— when(c)}, 
dowhen.pos, 

sched.when Jemma {a <— c} 


voted Jink Jemma: Lemma 

c in voted A MOK(c) A ( V a : (a, c) € G D a_correct ^.t jc(«, c)) 
D OK(j)(c) D rrunto(c)(j)(c) = runto(c)(c) 


votedJink.proof: Prove votedJinkJemma {a <— a@pl} from 
rrunto.task, simple .step Jemma, voted ax 


main.votedJemma: Lemma 
c in voted A induction.body(c) 

D OK(j)(c) D rrunto(c)(>)(c) = runto(c)(c) 


sensors mot .voted: Lemma c in voted D -i(c in Cs ) 


sensors mot.voted.proof: Prove sensors_not .voted from voted mx 
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main .vote -pro of: Prove main .voted Jemma from 
voted Jink Jemma, 
safe, 

stay .correct {a <— a@pl}, 
sensor-ax, 
sensorsmot-voted, 
induction-body {a <— a@p3}, 
induction-body {a a@p4} 

votedJnd-step-proof: Prove votedJnductive-step {a <- a@p3} from 
main.voted Jemma {j j@p2}, correct, induction-body 


End voted-step 


votedstepJcc 


voted_step_tcc: Module 
Using voted-step 
Exporting all withvotedjstep 
Theory 

c: Var simple_machine.C 

j: Var repl_machine.i2 

i : Var repl_machine.iZ 

a: Var simple_machine.C 

voted_step_lemma_TCCl: Formula 

(OK(j)(c)) A (c in voted) D (cell-type(c) ^ sensor.cell) 

sstep_taskJemma_TCCl: Formula 

(OK(i)(c)> 

A ( c in voted A ( V a : (a, c) E G D a_correct_at_c(a, c))) 

D (cell_type(c) ^ sensor.cell) 

rstep.task.TCCl: Formula 
(MOK(c) A c in voted 

A OK(j)(c) A (Vfl:(fl,c)eGD a.correct_at_c(a, c))) 
D (cell_type(c) ^ sensor_cell) 

active_task_TCCl: Formula 

(cell_type( if c in Cs then arb_task else c end if) 7 ^ sensorxell) 

Proof 

voted^step_lemma_TCCl_PROOF: Prove voted-step Jemma.TCCl 
sstep_taskJemma_TCCl_PROOF: Prove sstepJ;askJemma_TCCl 
rstep-task_TCCl_PROOF: Prove rstep_task_TCCl 
active_task_TCCl -PROOF: Prove active_task_TCCl 


End votedjstep_tcc 
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votedjstep_tcc.proofs: Module 
Proof 

Using votedjstep_tcc 

voted_step Jemma.TCCl-PROOF : Prove votedjstepJemma_TCCl from 
voted joc 

sstep_taskJemma_TCCl_PROOF: Prove sstep_taskJemma_TCCl from 
voted _ax 

rstep_task_TCClJPROOF: Prove rstep_task_TCCl from voted ^ax 
End voted_step.tcc_proofs 


correctn ess. proof 
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correctness_proof: Module 

Using correctness, voted-step, nonvoted_step, sensor_step, 
noetherian[C, (*1,*2) eG] 

Exporting withcorrectness 

Proof 

o, c: Var C 

discharge.well -founded: Prove well-founded {measure *- when} from 
Gbar.when (c *— 6} 

inductive.step: Lemma 

( V a : (a, c) € G D safe(c) A correct(a)) 3 correct(c) 

almost _final_proof: Prove inductive_step {a <— a@p7} from 
sensor _inductive_step, 
voted Jnductive_step, 
nonvoted Jnductive_step, 
induction-body {a <- a@pl}, 
induction-body {a <— a@p2}, 
induction-body {a <— a@p3}, 
induction-body 

final-proof: Prove the_result from 
modJnduction 
{4 ♦— safe, 

B *— correct, 

d *— c, 

d2 «— a@p3}, 

safe {a < — cG|@pl, c <— d 3 @pl}, 
inductive_step {c <— di@pl} 

End correctness.proof 
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outputs: Module 
Using correctness 
Exporting all withcorrectness 
Theory 
c: Var C 
j: Var R 

actuators.correct: Corollary 

c in voted A safe(c) A -ii r (j)(when(c)) 
D rrunto(c)(j)(c) = runto(c)(c) 


Proof 
a: Var C 
o: Var rstate 
i: Var R 
m: Var M 

vote-gives_maj: Lemma 

-i F(i)(m) A a in voted D vote(a, a, m)(i)(a) = maj(a, a) 
vote-gives_maj_proof: Prove vote_gives_maj from vote-ax {c - a} 

rrun_gets_maj: Lemma 

-ii r (i)(when(o)) A a in voted 

D rrunto(a)(i)(a) = maj(sstep(rrun(pred(when(a))),a,when(a)),a) 

rrun_gets_maj_proof: Prove rrun_gets_maj from 
rrun {m *— when(a)}, 
vote^gives_maj 

{<7 <— sstep(rrun(pred(when(a))), a, when(a)), 

m when(a)}, 
dowhen-pos {c^-fl}, 
sched_when_lemma 

working-agreement: Lemma 

-> F(z)(when(a)) A -.F(j)(when(a)) A a in voted 
D rrunto(a)(i)(a) = rrunto(a)(j)(a) 


outputs 
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working-agreement .proof: Prove working-agreement from 
rrun_gets_maj, mm_gets_maj {i *— j] 

safe.OK: Lemma safe(c) D ( 3 j : OK(j)(c)) 

safe _0K .proof: Prove safe.OK {j <— x@p4} from 
safe, 

MOK, 

natJnvariant {nat.var <— |fullset[R]|}, 
empty _prop[R] {a <- working(c)} 

actuators.correct.proof: Prove actuators_correct from 
the_result {c «— c@c}, 
correct {j <— i@p3, c <- c@c}, 
working-agreement {a <— c@c, i <— y@p4}, 
safe.OK, 

OK {m <— when(c), i <— i@p3}, 
commit .when Jemma 

End outputs 



Appendix B 

Cross-Reference Listing 


This Appendix provides a cross-reference listing to the identifiers declared in the 
Ehdm specification. It should assist in reading and navigating the Ehdm specifica- 
tions in Appendix A. 
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Identifier 

Declaration 

Module 

a.correct.at.c 

defined-fn 

connect 

active. task 

literal-fn 

voted.step 

active. tasks 

subtype- with 

simple.machine 

active_tasks_TCCl 

formula 

simple.machine.tcc 

active_tasks_TCCl_PR00F 

prove 

simple.machine. tcc 

active_tasks_TCCl_PR00F 

prove 

simple.machine. tcc.proof s 

act i ve_t ask.TCC 1 

formula 

voted.step.tcc 

active_task_TCCl_PR00F 

prove 

voted.step.tcc 

actuators 

subtype- with 

simple.machine 

actuators.correct 

formula 

outputs 

actuators. correct .proof 

prove 

outputs 

actuators.TCCl 

formula 

simple.machine.tcc 

actuators_TCCl_PR00F 

prove 

simple.machine.tcc 

actuators_TCCl_PR00F 

prove 

s impl e. machine . t c c .proof s 

add 

literal-fn 

sets 

all. correct. at.c 

defined-fn 

nonvoted.step 

almost. final. proof 

prove 

correctness.proof 

antisymmetry 

formula 

orderedsets 

arb.actuator 

const 

simple.machine 

arb.task 

const 

simple.machine 

backup 

defined-fn 

supports 

C 

type 

simple.machine 

card 

function 

cardinality 

card. ax 

axiom 

cardinality 

card.empty 

axiom 

cardinality 

cardinality 

module 

cardinality 

card.proof 

prove 

cardinality 

card.prop 

formula 

cardinality 

card. subset 

axiom 

cardinality 

cell.type 

function 

s imple.machine 

cell.types 

type 

simple.machine 

commit. Gbar. lemma 

formula 

supports 

commit. Gbar.lemma.proof 

prove 

supports 

commit _ support _ 1 emma 

formula 

supports 

commit .support .proof 

prove 

supports 

committed.lemma 

formula 

supports 


Table B.l: Ehdm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

committ ed.proof 

prove 

supports 

committed.to 

literai-fn 

supports 

commit. when. lemma 

formula 

supports 

commit. when.proof 

prove 

supports 

connect 

module 

connect 

correct 

defined-fn 

correctness 

correctness 

module 

correctness 

correctness. proof 

module 

correctness.proof 

correctness.tcc 

module 

correctness.tcc 

correctness.tcc.proof s 

module 

correctness.tcc.proof s 

critical. times 

literal-fn 

supports 

D 

type 

s impl e .machine 

dependency 

axiom 

s impl e .machine 

dependency _TCC1 

formula 

simple.machine.tcc 

dependency.TCCl.PROOF 

prove 

simple.machine.tcc 

dichotomy 

formula 

orderedsets 

difference 

literal-fn 

sets 

discharge 

prove 

nat induct ion 

discharge.antisymmetry 

prove 

supports 

discharge. dichotomy 

prove 

supports 

discharge. ref lexive 

prove 

supports 

discharge.transitive 

prove 

supports 

discharge. well .founded 

prove 

correctness.proof 

disharge.f init e 

prove 

repl.machine 

dowhen 

function 

simple. machine 

dowhen.pos 

axiom 

simple.machine 

dowhen.previous 

formula 

simple. machine 

dowhen.prev.proof 

prove 

simple.machine 

empty 

defined-fn 

sets 

empty.prop 

formula 

cardinality 

empty .prop.pr oof 

prove 

cardinality 

emptyset 

literal-const 

sets 

extensionality 

axiom 

sets 

F 

function 

repl.machine 

f inal.proof 

prove 

correctness.proof 

finite 

formula 

cardinality 


Table B.l: EHDM Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

foundation 

recursive-fn 

supports 

f oundat ion.TCC 1 

formula 

supports. tcc 

f oundat ion.TCC 1. PROOF 

prove 

supports.tcc 

f oundat ion_TCCl_PR0 OF 1 

prove 

support s.t cc.proof s 

f ound_ sub _ supp ort 

formula 

supports 

f ound_ sub_ support .proof 

prove 

supports 

f ound.support 

formula 

supports 

found.support. proof 

prove 

supports 

fullset 

literal-const 

sets 

Gbar 

function 

simple.machine 

Gbar.OK 

formula 

connect 

Gbar_0K_proof 

prove 

connect 

Gbar_support 

formula 

supports 

Gbar. support. prf 

prove 

supports 

Gbar.when 

axiom 

simple.machine 

general. induct ion 

axiom 

noetherian 

identity 

literal-fn 

nat induct ion 

identity 

literal-fn 

simple.machine 

ind.m.proof 

prove 

nat induct ion 

ind_m.proof.TCCl 

formula 

nat induct ion.tcc 

ind.m.proof .TCC1. PRO OF 

prove 

nat indue t ion. t c c 

ind.proof 

prove 

nat induct ion 

indstep 

formula 

simple.props 

indstep.proof 

prove 

simple.props 

induction 

formula 

natinduction 

induct ion. body 

defined-fn 

voted.step 

induct ion.m 

formula 

natinduction 

inductive.step 

formula 

correctness. proof 

in.own.support 

formula 

supports 

in.own. support .proof 

prove 

supports 

instance 

module 

natinduction 

intersection 

literal-fn 

sets 

limited. induct ion 

formula 

natinduction 

limited.proof 

prove 

natinduction 

link 

formula 

nonvoted.step 

link.proof 

prove 

nonvoted.step 


Table B.l: Ehdm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

M 

main.non. voted. lemma 

main.nonvoted.proof 

main . s ens or _ 1 emma 

main.sensor.proof 

main.voted.lemma 

main.vote.proof 

maj 

maj. ax 
maj .lemma 
maj .proof 
member 
min 
min.ax 

mod. induct ion 

mod.proof 

MOK 

nat induction 

nat induct ion. tcc 

noetherian 

nonvoted.ind.proof 

nonvoted. inductive. step 

nonvoted.rrunto.task 

nonvoted.rrunto.task.proof 

nonvoted.rrunto.task.TCCl 

nonvoted.rrunto.task.TCCl.PROOF 

nonvoted.step 

nonvoted.step.tcc 

nonvoted.task.OK 

nonvoted.task.OK.proof 

nonvoted.task.OK. TCC1 

nonvoted.task.OK.TCCl .PROOF 

notvoted.proof 

notvoted. transfer. correct 

OK 

OK.OK 

type 

formula 

prove 

formula 

prove 

formula 

prove 

function 

axiom 

formula 

prove 

literal-fn 

function 

axiom 

formula 

prove 

defined-fn 

module 

module 

module 

prove 

formula 

formula 

prove 

formula 

prove 

module 

module 

formula 

prove 

formula 

prove 

prove 

formula 

defined-fn 

formula 

s impl e.machine 

nonvoted.step 

nonvoted.step 

sensor. step 

sensor.step 

voted.step 

voted.step 

repl.machine 

repl.machine 

voted.step 

voted.step 

sets 

orderedsets 

orderedsets 

noetherian 

noetherian 

correctness 

nat induct ion 

nat induct ion.tcc 

noetherian 

nonvoted.step 

nonvoted.step 

nonvoted.step 

nonvoted.step 

nonvoted.step.tcc 

nonvoted.step.tcc 

nonvoted.step 

nonvoted.step.tcc 

nonvoted.step 

nonvoted.step 

nonvoted.step.tcc 

nonvoted.step.tcc 

connect 

connect 

correctness 

connect 


Table B.l: Eh DM Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

0K_0K_proof 

prove 

connect 

orderedsets 

module 

orderedsets 

outputs 

module 

outputs 

prev 

literal-fn 

natinduction 

previous 

literal-fn 

simple_machine 

r 

const 

repl_machine 

R 

subtype- with 

repl .machine 

reflexive 

formula 

orderedsets 

repl.machine 

module 

repl.machine 

repl jnachine_tcc 

module 

repl_machine_tcc 

repl_machine_tcc_proof s 

module 

repl_machine_tcc_proof s 

r_indstep 

formula 

connect 

r_indstep_proof 

prove 

connect 

rrun 

recursive-fn 

repl .machine 

rrun_gets_maj 

formula 

outputs 

rrun.get s_ma j _proof 

prove 

outputs 

rrun.TCCl 

formula 

repl_machine_tcc 

rrun.TCCl .PROOF 

prove 

repl_machine_tcc 

rrun_TCC2 

formula 

repl_machine_tcc 

rrun_TCC2_PR00F 

prove 

repl_machine_tcc 

rrunto 

literal-fn 

repl_machine 

rrunto_task 

formula 

voted.step 

rrunto_task_proof 

prove 

voted_step 

rstate 

type 

repl.machine 

rstep 

literal-fn 

repl.machine 

rstep.task 

formula 

voted.step 

rstep_task_proof 

prove 

voted_step 

rstep_task_TCCl 

formula 

voted_step_tcc 

rstep_task_TCCl_PROOF 

prove 

voted_step_tcc 

rstep_task_TCCl_PROOF 

prove 

voted_step_tcc_proof s 

R.TCCl 

formula 

repl_machine_tcc 

R_TCC1_PR00F 

prove 

repl_machine_tcc 

R_TCC1_PR00F 

prove 

repl_machine_tcc_proof s 

run 

recursive-fn 

s imple.machine 

run_TCCl 

formula 

simple_machine_tcc 

run_TCCl_PR00F 

prove 

simple^machine.tcc 


Table B.l: Eh dm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

run_TCC2 

formula 

simple.machine.tcc 

run TCC2_PR00F 

prove 

simple.machine.tcc 

runto 

literal-fn 

simple .machine 

safe 

recursive-fn 

correctness 

saf e.at.a 

formula 

connect 

saf e_at_a_proof 

prove 

connect 

saf e_0K 

formula 

outputs 

safe_OK_proof 

prove 

outputs 

safe.TCCl 

formula 

correctness. tcc 

safe TCCl.PROOF 

prove 

correctness. tcc 

safe TCCl.PROOF 

prove 

correctness. tcc.proofs 

sched 

function 

simple.machine 

sched.when.ax 

axiom 

simple. machine 

sched when.lemma 

formula 

simple.machine 

sched. when.proof 

prove 

simple.machine 

sensor 

function 

simple.machine 

sensor.ax 

axiom 

simple.machine 

sensor. fn 

type 

simple.machine 

sensor. ind.step.proof 

prove 

sensor. step 

sensor. inductive. step 

formula 

sensor. step 

sensor.rrunto.lemma 

formula 

sensor. step 

sensor. rrunto. lemma. TCC1 

formula 

sensor. st ep.tcc 

sensor. rrunto. lemma. TCC1 .PROOF 

prove 

sensor. step. tcc 

sensor.rrunto.proof 

prove 

sensor.step 

sensors 

subtype- with 

simple.machine 

sensors.not. voted 

formula 

voted. step 

sensors.not.voted.proof 

prove 

voted.step 

sensors.TCCl 

formula 

simple.machine.tcc 

sensors .TCCl.PROOF 

prove 

simple.machine.tcc 

sensors. TCCl.PROOF 

prove 

simple.machine. tcc.proofs 

sensor. step 

module 

sensor.step 

sensor. step. lemma 

formula 

sensor.step 

sensor.step.proof 

prove 

sensor.step 

sensor .step. tcc 

module 

sensor. st ep.tcc 

set 

type 

sets 

sets 

module 

sets 


Table B.l: Eh dm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

simple.machine 

simple.machine.tcc 

simple.machine.tcc.proof s 

simple.props 

simple.props.tcc 

simple. sensor. step. lemma 

simple. sensor. st ep.lemma.TCCl 

simple.sensor_step_lemma.TCCl. PROOF 

simple. sensor. st ep.proof 

simple. step. lemma 

simple. step.lemma.proof 

simple. st ep.lemma.TCCl 

simple. st ep.lemma.TCCl .PROOF 

singleton 

sstep 

sstep.ax 

sstep.t ask. lemma 

sstep.task.lemma.TCCl 

sstep. task.lemma.TCCl. PROOF 

sstep.task.lemma.TCCl. PROOF 

sstep.task.proof 

start.cell 

state 

stay. correct 
stay.correct .proof 
stay .correct _repl 
stay. correct _repl. proof 
stay. correct .simple 
stay.simple.proof 
step 

step.TCCl 
step.TCCl. PROOF 
step_TCC2 
step.TCC2. PROOF 
subset 

subset .support 

module 

module 

module 

module 

module 

formula 

formula 

prove 

prove 

formula 

prove 

formula 

prove 

literal-fn 

function 

axiom 

formula 

formula 

prove 

prove 

prove 

const 

type 

formula 

prove 

formula 

prove 

formula 

prove 

defined-fn 

formula 

prove 

formula 

prove 

defined-fn 

formula 

simple.machine 

simple.machine.tcc 

simple.machine. tcc. proof s 

simple.props 

simple.props.tcc 

simple.props 

simple.props.tcc 

simple.props.tcc 

simple.props 

simple.props 

simple.props 

simple.props.tcc 

simple.props.tcc 

sets 

repl.machine 

repl.machine 

voted.step 

voted.step.tcc 

voted. step.tcc 

voted. step.tcc.proof s 

voted.step 

simple.machine 

simple.machine 

connect 

connect 

connect 

connect 

simple.props 

simple.props 

simple.machine 

simple.machine.tcc 

simple.machine.tcc 

simple.machine.tcc 

simple.machine.tcc 

sets 

supports 


Table B.l: Ehdm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

subset.support .proof 

prove 

supports 

subset.union 

formula 

cardinality 

subset.union.proof 

prove 

cardinality 

support 

defined-fn 

supports 

support .backup 

formula 

supports 

support .backup .proof 

prove 

supports 

supports 

module 

supports 

support s.tcc 

module 

support s.tcc 

supports.tcc. proof s 

module 

supports .tcc.proofs 

task 

function 

simple.machine 

task.fn 

type 

s impl e .mach ine 

the.result 

formula 

correctness 

torch.carried 

formula 

connect 

torch.proof 

prove 

connect 

transitive 

formula 

orderedsets 

twice.proof 

prove 

cardinality 

twice.prop 

formula 

cardinality 

undef 

const 

s impl e. mach ine 

union 

literal-fn 

sets 

unique .when 

formula 

simple.machine 

unique .when.pr oof 

prove 

simple.machine 

unvoted.proof 

prove 

connect 

unvoted.transf er.correct 

formula 

connect 

vote 

function 

repl.machine 

vote.appln 

formula 

connect 

vote.appln.proof 

prove 

connect 

vote. ax 

axiom 

repl.machine 

voted 

subtype 

repl.machine 

voted. ax 

axiom 

repl.machine 

voted. ind.step.proof 

prove 

voted.step 

voted. inductive. step 

formula 

voted.step 

vot ed_ 1 ink. 1 emma 

formula 

voted.step 

voted.link.proof 

prove 

voted.step 

voted.proof 

prove 

connect 

voted. step 

module 

voted.step 

vo ted. step. lemma 

formula 

voted.step 


Table B.l: Ehdm Identifers used in the Specification (continues) 
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Identifier 

Declaration 

Module 

voted_step_lenuna_TCCl 

formula 

voted.step.tcc 

voted_step_lemma_TCCl_PROOF 

prove 

voted.step.tcc 

voted_step_lemma_TCCl_PROOF 

prove 

voted.step.tcc.proof s 

voted.step.proof 

prove 

voted.step 

voted.step.tcc 

module 

vo t ed_ s t ep. t c c 

voted.step.tcc.proof s 

module 

voted.step.tcc.proof s 

voted.transfer. correct 

formula 

connect 

vote.gives^maj 

formula 

outputs 

vote.gives.maj .proof 

prove 

outputs 

vote.lemma 

formula 

voted.step 

vote.lemma.proof 

prove 

voted.step 

well.founded 

formula 

noetherian 

when.sched. lemma 

formula 

simple. machine 

when.sched.proof 

prove 

simple.machine 

working 

literal-fn 

correctness 

working. agreement 

formula 

outputs 

work ing. agr e ement .proof 

prove 

outputs 


Table B.l: Ehdm Identifers used in the Specification 
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Results of Proof-Chain 
Analysis 


The following pages reproduce the output from the Ehdm proof-chain analyzer in 
“terse mode” applied to the formula actuators_correct in module outputs. The 
Ehdm proof-chain analyzer examines the macroscopic structure of a verification 
checking that all the premises used in a proof are either axioms, definitions, or 
formulas which are, themselves, the target of a successful proof elsewhere in the 
verification. If any formulas are used from a module having an assuming clause, then 
the proof-chain analyzer checks that those assumptions are discharged by successful 
proofs; similarly, if formulas are used from a module having a tcc module, then 
the proof-chain analyzer checks that all the tecs in that module are discharged by 
successful proofs. The proof-chain analyzer ignores unsuccessful proofs (such as 
automatically-generated tcc proofs) when a successful proof for the same formula 
can be found. The “terse mode” output reproduced here provides a commentary 
on only the “interesting” cases, namely proof obligations involving assuming clauses 
and tecs, and a summary. All the proofs listed in the summary were performed by 
the Ehdm theorem prover in “checking mode.” 
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Proof chain for formula actuators_correct in module outputs 

Use of the formula 

correctness . the_result 
requires the following TCCs to be proven 
correctness.tcc . saf e_TCCl 

Formula correctness_tcc . saf e.TCCl is a termination TCC for correctness . saf e 
Proof of 

correctness.tcc.saf eJTCCl 
must not use 

correctness . safe 

Use of the formula 

simple_machine . Gbar_when 
requires the following TCCs to be proven 
simple_machine_tcc . sensors_TCCl 
simple_machine_tcc .actuators_TCCl 
simple_machine_tcc . active_tasks_TCCl 
simple_machine_tcc . dependency.TCCl 
simple_machine_tcc . step_TCCl 
simple_machine_tcc . step_TCC2 
simple_machine_tcc.run_TCCl 
simple_machine_tcc . run_TCC2 

Formula simple_machine_tcc ,run_TCC2 is a termination TCC for simple machine. run 
Proof of 

simple_machine_t cc . run_TCC2 
must not use 

simple-machine . run 

Use of the formula 

noetherian [simple-machine . C , simple_machine . Gbar] . mod-induction 
requires the following assumptions to be discharged 

noetherian [simple-machine . C , simple_machine , Gbar] .well-founded 

Use of the formula 

sensor_step . sensor_inductive_step 
requires the following TCCs to be proven 
sensor_step__tcc . sensor_rrunto_lenuna_TCCl 

Use of the formula 

simple _props . simple_sensor_step_lemma 
requires the following TCCs to be proven 

simple_props_tcc . simple_sensor_step_lemma_TCCl 
simple_props_tcc . simple_step_lemma_TCCl 
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Use of the formula 
repl_machine . rrun 

requires the following TCCs to be proven 
repl_machine_tcc . R_TCC1 
repl_machine_tcc . rrun.TCCl 
repl_machine_tcc . rrun_TCC2 

Formula repl.machine.tcc .rmn_TCC2 is a termination TCC tor repl.machine . rrun 
Proof of 

repl_machine_tcc . rrun_TCC2 
must not use 

repl_ma chine . rrun 

Use of the formula 

supports . commit^when^emma 
requires the following TCCs to be proven 
supports_tcc . f oundation_TCCl 

Formula support s_t cc .f oundationJTCCl is a termination TCC for 
supports .foundation 
Proof of 

support s_tcc . f oundation_TCCl 
must not use 

supports .foundation 

Use of the formula 

order edsets [naturalnumber , <=] .min_ax 
requires the following assumptions to be discharged 
order edsets [naturalnumber , <=] .reflexive 
order edsets [naturalnumber , <=] .transitive 
order edsets [naturalnumber, <=] .antisymmetry 
orderedsets [naturalnumber, <=] .dichotomy 

Use of the formula 

voted_step . voted_induct ive_step 
requires the following TCCs to be proven 
voted_step_tcc . voted_step_lemma_TCCl 
voted_step_tcc . sstep_task_lemma_TCCl 
voted_step_tcc .rstep_task_TCCl 
voted_step_tcc . active_task JTCC1 


Use of the formula 

nonvot ed_st ep . nonvoted_t ask_OK 
requires the following TCCs to be proven 
nonvoted_step_tcc . nonvoted_task_QK_TCCl 
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nonvoted_step_tcc . nonvoted_rrimto_task_TCCl 

Use of the formula 
nat induct ion. induct ion_m 
requires the following TCCs to be proven 
natinduction.tcc . ind_m_proof _TCC1 

Use of the formula 

noetherian [naturalnumber , nat induct ion. prev] .general_induction 
requires the following assumptions to be discharged 
noetherian [naturalnumber , nat induct ion. prev] .well-founded 

Use of the formula 

cardinality [repl-machine . R] . card_prop 
requires the following assumptions to be discharged 
cardinality [repl.machine . R] .finite 

========== ======== SUMMARY ================== 

The proof chain is complete 

The axioms and assumptions at the base are: 
cardinality [EXPR] . card- ax 
cardinality [EXPR] .card-empty 
cardinality [EXPR] .card-subset 
naturalnumbers .nat -in variant 
noetherian [EXPR, EXPR] .general-induction 
orderedsets [EXPR, EXPR] .min_ax 
repl-machine . R-invariant 
repl_machine .maj-ax 
repl-machine . sstep_ax 
repl-machine . vote_ax 
repl-machine . voted_ax 
sets [EXPR] . extensionality 
simple_machine . Gbar-When 
simple-machine .dependency 
simple_machine . distinct-cell-types 
simple_machine . dowhen-pos 
simple— machine . sched-When_ ax 
simple_machine . sensor_ax 
Total: 18 

The definitions are: 
connect . a_correct-at_c 
correctness .MOK 
correctness. OK 
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correctness . correct 
correctness . sale 
nonvot ed_ step . all_correct_at_c 
repljoaachine . rnin 
sets [EXPR] .empty 
sets [EXPR] .subset 
simple .machine .run 
s imple_machine . step 
supports . backup 
supports .foundation 
supports . support 
voted.step . induct ionjbody 
Total: 15 

The formulae used axe: 

cardinality [EXPR] .card.prop 
cardinality [EXPR] .empty_prop 
cardinality [EXPR] . subset.union 
cardinality [EXPR] .twice.prop 
cardinality [repl.machine . R] . 1 init e 
connect . Gbar_0K 
connect . 0K_0K 

connect .notvoted.transf er.correct 

connect . r.indstep 

connect . saf e.at.a 

connect . stay.correct 

connect . stay_correct_repl 

connect . torch.carr ied 

connect .unvoted_transf er_correct 

connect . vote.appln 

connect . voted.transf er.correct 

correctness . the_result 

correctness_proof . induct ive_step 

correctness.tcc . saf e.TCCl 

nat induct ion. induction 

nat induct ion. inductions 

nat induct ion . limited, induct ion 
nat induct ion.tcc . ind_m_proof _TCC1 
noether ian [EXPR , EXPR] .mod.induct ion 

noetherian[naturalnumber, nat induct ion . prev] . well.f ounded 
noetherian [simple.machine . C , simple .machine. Gbar] .well-founded 
nonvot ed.step • link 

nonvot ed.st ep . main_non_ vot ed.lemma 
nonvot ed.step . nonvot ed_ induct ive.step 
nonvot ed_st ep . nonvoted.rrunto.task 
nonvot ed_st ep . nonvoted.task.OK 
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nonvoted.step.tcc . nonvoted.rrunto_task.TCCl 
nonvoted.step.tcc .nonvoted_task.OK.TCCl 
order edsets [naturalnumber , <=] .antisymmetry 
orderedsets [natural number , <=] .dichotomy 
orderedsets [naturalnumber, <=] .reflexive 
orderedsets [naturalnumber , <=] .transitive 
outputs . actuators_correct 
outputs . rrun.gets.maj 
outputs. safe.OK 
outputs . vote.gives.maj 
outputs . working_agreement 
repl.machine.tcc . R.TCC1 
repl.machine.tcc . rrun.TCCl 
repl.machine.tcc . rrun_TCC2 
sensor.step . main.sensor.lemma 
sensor.step . sensor.inductive.step 
sensor_step . sens or _rrunto_ lemma 
sensor.step. sensor.step.lemma 
sensor.step.tcc. sensor.rrunto.lemma.TCCl 

simple .machine . dowhen.previous 
s imple.machine . sched.when.lemma 
s imple.machine . unique.when 
simple_machine . when.sched.lemma 
simple.machine.tcc . active.tasks.TCCl 
simple_machine_tcc . actuators.TCCl 
simple.machine.tcc . dependency_TCCl 
simple.machine.tcc . run_TCCl 
simple.machine.tcc . run_TCC2 
simple.machine.tcc . sensors.TCCl 
simple.machine.tcc . step.TCCl 
simple.machine.tcc . step_TCC2 
simple.props . indstep 
simple.props . simple.sensor.step.lemma 
simple.props . simple.step.lemma 
simple.props . stay.correct.simple 
simple.props.tcc . simple. sens or. step.lemma.TCCl 
simple.props.tcc. simple.step.lemma.TCCl 

supports . Gbar_ support 
supports . commit.Gbar.lemma 
supports . commit. support .lemma 
supports . commit. when. lemma 
supports . committed.lemma 
supports . f ound.sub. support 
supports . found. support 
supports . in. own. support 
supports . subset.support 
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supports . support ^backup 
supports_tcc . f oundation_TCCl 
vot ed_st ep . main_vot ed_lemma 
voted_step.maj _lemraa 
voted_step . rrunto_t ask 
voted.step . rstep.task 
voted_step . sensors_not_voted 
vot ed_step . sst ep_t ask_lemma 
voted_step . vote_lemma 
voted_step . voted_inductive_step 
voted_step . voted_link_lerama 
voted_step. voted_step_lerama 
voted_step_tcc . active.. task_TCCl 
voted_step_tcc . rstep_task_TCCl 
voted_step_tcc . sstep_task_lemma_TCCl 
voted_step_tcc . voted_step_lemma_TCCl 

Total: 93 

The completed proofs are: 

cardinality [EXPR] . card_proof 

cardinality [EXPR] . empty_prop_proof 

cardinality [EXPR] • subs et_union_pr oof 

cardinality [EXPR] . twice_proof 

connect . Gbar_OK_proof 

connect . OK_OK_proof 

connect .notvoted_proof 

connect . r_indstep_proof 

connect . saf e_at_a_proof 

connect . stay_correct_proof 

connect . stay_correct_repl_proof 

connect .torch_proof 

connect .unvoted_pr oof 

connect . vote_appln_proof 
connect . voted_proof 

correctness_proof . almost _f in al_pr oof 
correctness_proof . discharge_well_f ounded 
correctness_proof . f inal_proof 
correctness_tcc_proof s . safeJTCCl .PROOF 
nat induction . discharge 
natinduction. ind.m.proof 
nat induct ion . ind.proof 
nat induct ion. limit ed.proof 

nat induct ion.tcc . ind_m_proof _TCC1 .PROOF 
noetherian[EXPR, EXPR] .mod.pr oof 
nonvoted.step . link .proof 
nonvoted.step . main.nonvoted.proof 
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nonvoted_step . nonvoted.ind.proof 
nonvoted.step . nonvoted.rrunto.task.proof 
nonvoted_step . nonvoted.task.OK.proof 

nonvoted.step.tcc .nonvoted.rrunto.task.TCCi.PRQOF 

nonvoted.step.tcc . nonvoted.task.OK.TCCl.PROOF 

outputs . actuators.correct.proof 

output s . rrun.get s _ma j .proof 

outputs . saf e_0K_proof 

outputs . vote.gives.maj .proof 

outputs . working. agreement.proof 

re pl-.ma chine . disharge.f inite 

repl.machine.tcc . rrun_TCCi_PR00F 

repl.machine.tcc . rrun_TCC2_PR00F 

repl.machine.tcc.proof s . R.TCC1.PR00F 

sensor.step.main.sensor.proof 

sensor.step . sensor.ind.step.proof 

sensor.step. sensor.rrunto.proof 

sensor.step . sensor.step.proof 

sensor.step.tcc . s ensor.rrunt o_l emma.TCC 1.PR00F 

s imple.machine . dowhen.pr ev.proof 

simple.raachine . sched.when.proof 

simple.machine .unique.when.proof 

simple.machine . when.sched.proof 

simple.machine.t cc . dependency. TCC 1. PROOF 

simple.machine.tcc . run.TCCl.PROOF 

simple.machine.tcc . nm_TCC2_PR00F 

simple.machine.tcc . step.TCCl.PROOF 

simple.machine.tcc . step_TCC2_PR00F 

simple.machine.tcc.proof s . active.tasks.TCCl.PROOF 

simple.machine.tcc.proof s . actuators.TCCl.PROOF 

simple.machine.tcc.proof s . sensors.TCCl. PROOF 

simple.props . indstep.proof 

simple .props . simple.sensor.step.proof 

simple.props . simple.step.lemma.proof 

simple.props . stay.simple.proof 

simple.props.tcc . simple.sensor.step.lemma.TCCl.PROOF 

simple.props.tcc . simple.step.lemma.TCCi.PROOF 

supports . Gbar. support _prf 

supports . commit.Gbar.lemma.proof 

supports . commit.support.proof 

supports . commit .when.proof 

supports . committed.proof 

supports . discharge. anti symmetry 

supports .discharge.dichotomy 

supports . discharge.ref lexive 

supports .discharge.transitive 
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supports . 1 ouud_sub_support_proof 
supports . lound_support_proof 
supports . in_own_ support _pr ool 
supports . subs et_ support _prool 
supports . support _backup_proot 

supports_tcc_proof s .f oundation_TCCl_PROOF 

voted_step . main_vote_proof 
voted_step.maj_proof 
voted_step . rrunto_task_proof 
voted_step . rstep_task_prool 
voted_step . sensors_not_voted_proof 
voted_step . sstep_task_proo:f 
voted_step. vote_lemma_prool 
voted_step . vot ed_ iud - .step_prool 
voted_step . voted_link_prool 
voted_step . voted_step_proof 
voted_step_tcc . act ive_task_TCCl_PROOF 

voted_step_tcc_proofs.rstep_task_TCCl_PROQF 

voted_step_tcc_proof s . sstep_task_lemma_TCCl .PROOF 
voted_step_tcc_proof s . voted_step_lemma_TCCl_PROOF 

Total: 93 
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